5. Feature Engineering

Contents

This 1 hour workshop discusses the basic concepts and techniques of feature engineering. It will introduce a toolbox of several numerical as well as visual techniques for inspecting feature sets and eventually improving the predictive performance of machine learning models.

More specifically, after this workshop the workshop participants will

  • be familiar with the broad field of feature engineering, including feature transformation, feature extraction and feature selection,
  • be able to detect potential pitfalls in the representation of your categorical (fct) and numerical (int, dbl) features and know how to alleviate them,
  • know important techniques for investigating and handling outliers as well as missing values,
  • be aware of the concept of data leakage and how to prevent it, and
  • have a good overview of the different approaches to feature selection.

Agenda

1 Learning Objectives

2 The Generic Machine Learning Pipeline

3 Introduction to Feature Engineering

4 Engineering of Categorical and Numerical Features

4.1 Engineering of Categorical Features
4.2 Engineering of Numerical Features
4.3 Handling of Extreme Values
4.4 Handling of Missing Values

5 Excursus: Data Leakage

6 Feature Selection

6.1 Intrinsic Methods
6.2 Filter and Wrapper Methods

7 Outlook