Dealing with Missing Data in Machine Learning
How to Deal with Missing Data
- Remove
- Listwise deletion: Remove the whole record; Works if the missing values are random.
- Removing values causes problem in many aspects. For example, we can not just delete data when applying our models.
- Replace
- with most frequent value
- central tendency: median, mean, etc
- fixed value: a string etc
- New Category: define a new category for missing data
- Convert the column to a binary valued column indicating if the feature is missing or not.
Tools
- pandas
- sklearn: Imputer
- @ResidentMario/missingno : visualize missing data
Planted:
by L Ma;
Dynamic Backlinks to
wiki/machine-learning/feature-engineering/missing-data
:wiki/machine-learning/feature-engineering/missing-data
Links to:LM (2019). 'Dealing with Missing Data in Machine Learning', Datumorphism, 08 April. Available at: https://datumorphism.leima.is/wiki/machine-learning/feature-engineering/missing-data/.