Dealing with Missing Data in Machine Learning
How to Deal with Missing Data
- Remove
- Listwise deletion: Remove the whole record; Works if the missing values are random.
- Removing values causes problem in many aspects. For example, we can not just delete data when applying our models.
- Replace
- with most frequent value
- central tendency: median, mean, etc
- fixed value: a string etc
- New Category: define a new category for missing data
- Convert the column to a binary valued column indicating if the feature is missing or not.
- pandas
- sklearn: Imputer
- @ResidentMario/missingno : visualize missing data
