Teaching > Machine Learning in R > 6. Imbalanced Learning
6. Imbalanced Learning
Contents
This 1 hour workshop serves as a gentle introduction to the field of imbalanced learning. Participants will not only learn about the peculiarities and implications of working with imbalanced data sets, but also how to address class imbalance within the typical machine learning pipeline.
More specifically, after this workshop participants will
- be able to identify an imbalanced data set and know about its implications for modeling,
- carry a toolbox of techniques for addressing class imbalance at various stages in your machine learning pipeline (e.g., data collection, resampling, model estimation or model evaluation),
- have internalized basic (random under- and oversampling) and more advanced techniques of resampling (SMOTE, Borderline SMOTE, NearMiss),
- know how to distinguish alternative routes to handling class imbalance, such as imbalanced learning or cost-sensitive learning.
Agenda
1 Learning Objectives
2 Introduction to Imbalanced Learning
3 Techniques for Addressing Class Imbalance
3.1 Sampling Strategies
3.1.1 Random Oversampling
3.1.2 Synthetic Minority Over-Sampling Technique (SMOTE)
3.1.3 Borderline SMOTE
3.1.4 Random Undersampling
3.1.5 Informed NearMiss Undersampling
3.2 Case Weighting
3.3 Excursus: Evaluation of Classification Models
3.4 Tinkering with Classification Cutoffs
4 Cost-Sensitive Learning