How to Deal with Missing Data
- Listwise deletion: Remove the whole record; Works if the missing values are random.
- Removing values causes problem in many aspects. For example, we can not just delete data when applying our models.
- with most frequent value
- central tendency: median, mean, etc
- fixed value: a string etc
- New Category: define a new category for missing data
- Convert the column to a binary valued column indicating if the feature is missing or not.
- sklearn: Imputer
- @ResidentMario/missingno : visualize missing data
Planted: by L Ma;
LM (2019). 'Dealing with Missing Data in Machine Learning', Datumorphism, 08 April. Available at: https://datumorphism.leima.is/wiki/machine-learning/feature-engineering/missing-data/.