简体   繁体   中英

Pandas map on series

I have a DataFrame with lots of categories, but I'm only trying to use two. I managed to get the result I wanted but it wasn't accepted in my project ('there's better ways of doing it'). Working with 2 columns - Gender (M/F) and Showed (1/0) I'm trying to get out 4 variables: male1, male0, female1, female0 to create bar chart with them.

I was told to use pd.series.map function but I've looked everywhere and can't find a good example on it - also not really sure how to get 4 variables out of it.

Thanks for any help.

pd.Series.map is unnecessary. You can use GroupBy here and output a dictionary:

df = pd.DataFrame([['M', 0], ['M', 1], ['M', 1], ['F', 0], ['F', 0], ['F', 1]],
                  columns=['Gender', 'Showed'])

d = df.groupby(['Gender', 'Showed']).size().to_dict()

# {('F', 0): 2, ('F', 1): 1, ('M', 0): 1, ('M', 1): 2}

In general, you should avoid creating a variable number of variables. A dictionary allows you to extract values efficiently, eg via d[('F', 0)] for Female gender and 0 showed.


But if you really must use map , you can use the pd.Index.map version:

d = df.groupby(['Gender', 'Showed']).size()

res = df.drop_duplicates()
res['Counts'] = res.set_index(['Gender', 'Showed']).index.map(d.get)

print(res)

  Gender  Showed  Counts
0      M       0       1
1      M       1       2
3      F       0       2
5      F       1       1

This seems like a case for crosstab (it's a built-in function :D)

import pandas as pd
df = pd.DataFrame([['M', 0], ['M', 1], ['M', 1], ['F', 0], ['F', 0], ['F', 1]],
                  columns=['Gender', 'Showed'])

pd.crosstab(df.Gender, df.Showed)

Output:

Showed  0  1
Gender      
F       2  1
M       1  2

You can do this in 4 simple lines.

male0 = ((df['Gender'] == 'M') & (df['Showed'] == 0)).sum()
female0 = ((df['Gender'] == 'F') & (df['Showed'] == 0)).sum()
male1 = ((df['Gender'] == 'M') & (df['Showed'] == 1)).sum()
female1 = ((df['Gender'] == 'F') & (df['Showed'] == 1)).sum()

Using apply , since you need two series and not one, you need to use apply .

male0 = df[['Gender', 'Showed']].apply(lambda row: row['Gender'] == 'M' and row['Showed'] == 0, axis=1).sum() 
female0 = df[['Gender', 'Showed']].apply(lambda row: row['Gender'] == 'F' and row['Showed'] == 0, axis=1).sum() 
male1 = df[['Gender', 'Showed']].apply(lambda row: row['Gender'] == 'M' and row['Showed'] == 1, axis=1).sum() 
female1 = df[['Gender', 'Showed']].apply(lambda row: row['Gender'] == 'F' and row['Showed'] == 1, axis=1).sum() 

Using groupby

counts = df.groupby(['Gender', 'Showed']).size().reset_index(name='Count')   

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM