Dealing with Missing Data in Machine Learning

How to Deal with Missing Data

  1. Remove
    1. Listwise deletion: Remove the whole record; Works if the missing values are random.
    2. Removing values causes problem in many aspects. For example, we can not just delete data when applying our models.
  2. Replace
    1. with most frequent value
    2. central tendency: median, mean, etc
    3. fixed value: a string etc
  3. New Category: define a new category for missing data
  4. Convert the column to a binary valued column indicating if the feature is missing or not.

Tools

  1. pandas
  2. sklearn: Imputer
  3. @ResidentMario/missingno : visualize missing data

Planted: by ;

Dynamic Backlinks to wiki/machine-learning/feature-engineering/missing-data:

LM (2019). 'Dealing with Missing Data in Machine Learning', Datumorphism, 08 April. Available at: https://datumorphism.leima.is/wiki/machine-learning/feature-engineering/missing-data/.