Pandas Groupby Does Not Guarantee Unique Content in Groupby Columns
Pandas Groupby Does Not Guarantee Unique Content in Groupby Columns, it also considers the datatypes. Dealing with mixed types requires additional attention.
Pandas groupby also considers the data types.
import pandas as pd
data = [{"student": 1, "score": 1}, {"student": "1", "score": 2}]
df = pd.DataFrame(data)
df.groupby("student").count()
What we see is
score
student
1 1
1 1
The value 1
and "1"
are different.
Suppose we are given a bunch of scores of students and we would like to use the median of scores as the final scores. Somehow, the student ids are encoded either in strings or in integers. It is crucial to make sure the data types are the same before grouping.
Planted:
by L Ma;
Similar Articles:
Lei Ma (2020). 'Pandas Groupby Does Not Guarantee Unique Content in Groupby Columns', Datumorphism, 04 April. Available at: https://datumorphism.leima.is/til/machine-learning/pandas-groupby-caveats/.