Calculate unique values in one column based upon non-null values in another

Question

Working through this: https://towardsdatascience.com/exploratory-statistical-data-analysis-with-a-real-dataset-using-pandas-208007798b92

A little shy of half way through, the author calculates the number of unique medal winners with this line of code:

medal_winners = len(df[df.Medal.fillna('None') != 'None'].Name.unique())

This seems rather unnecessarily complicated, so I am trying to simplify it.

Ultimately, I believe that line of code is saying: first check for non-null values in the 'Medal' column, then get the number of unique names who have won medals.

To me this is: check 'Medal' for a non-null value, then groupby name and get the number of unique names who have won a medal. The type of medal does not matter, so if John Doe won three different medals, I only count him once. All I want is the total number of unique medal winners.

I came up with this:

medal_winners = df['Medal'].notnull().groupby['Name'].nunique()

But I get this error: TypeError: 'method' object is not subscriptable

I have tried other variations on what I think should work, but every time I get an error.

I thought the above would work, but it doesn't.

Answer 1

I just figured it out, but even with groupby() the solution is still longer than I expected -- or maybe I should say I did not achieve what I thought would be increased simplification:

medal_winners = df[df['Medal'].notnull()].groupby('Name')['Name'].nunique().sum()

Both my groupby() based solution and the authors yield an answer of: 28202

Calculate unique values in one column based upon non-null values in another

Question

1 answers

solution1
0 2019-12-16 23:34:38

Calculate unique values in one column based upon non-null values in another

Question

1 answers

solution1 0 2019-12-16 23:34:38

solution1
0 2019-12-16 23:34:38