Pandas Groupby Does Not Guarantee Unique Content in Groupby Columns

Pandas Groupby Does Not Guarantee Unique Content in Groupby Columns, it also considers the datatypes. Dealing with mixed types requires additional attention.

Pandas groupby also considers the data types.

import pandas as pd

data = [{"student": 1, "score": 1}, {"student": "1", "score": 2}]

df = pd.DataFrame(data)

df.groupby("student").count()

What we see is

        score
student
1       1
1       1

The value 1 and "1" are different.

Suppose we are given a bunch of scores of students and we would like to use the median of scores as the final scores. Somehow, the student ids are encoded either in strings or in integers. It is crucial to make sure the data types are the same before grouping.

Planted: by ;

Lei Ma (2020). 'Pandas Groupby Does Not Guarantee Unique Content in Groupby Columns', Datumorphism, 04 April. Available at: https://datumorphism.leima.is/til/machine-learning/pandas-groupby-caveats/.