Deal with Rare Categories Using Pandas

We will illustrate how to deal with rare categories using pandas mask.

import pandas as pd

# Create fake names
frequent_names = list('ABC')
rare_names = list('DEF')

dataset = sum(
    [[i]*10 for i in frequent_names] + [[i]*2 for i in rare_names],

# Create a series based on the names
series = pd.Series(dataset)


# Find the counts of the names in the series
series_counts = series.value_counts()

# Find names that has less than 10 counts
# And create a mask
mask = series.isin(series_counts.loc[series_counts<10].index)

# Set these rare names to X
series[mask] = 'X'

# Check the new series

The original series has value counts

C    10
A    10
B    10
F     2
D     2
E     2

The new series has value counts

C    10
A    10
B    10
X     6

Planted: by ;

LM (2021). 'Deal with Rare Categories Using Pandas', Datumorphism, 03 April. Available at: