Trying to understand groupby and pivot_table better.
I have a dataframe like this:
df = pd.DataFrame({"Year": np.random.choice([2017,2018,2019], 1000),
"Age":np.random.choice(['<30','30-40','40-50','50+'], 1000),
"Pref":np.random.choice(['Yes','No'], 1000)})
How do I group the data by ['Year', 'Age']
such that the resulting column tells me what percentage of 'Pref'
were 'Yes' for that particular age group that year: what percentage of under 30s responded yes in 2017? etc..
I'd like to produce a transformed dataframe something like:
% Yes
Year Age
2017 <30 45
30-40 52
40-50 58
50+ 44
2018 <30 56
30-40 53
40-50 50
50+ 44
2019 <30 40
30-40 38
40-50 51
50+ 53
How can I do that?
df.groupby(['Year', 'Age']).agg(lambda x: 100 * sum(i == 'Yes' for i in x) / len(x))
Try this:
(df['Pref'] == 'Yes').rename('% Yes').groupby([df['Year'], df['Age']]).mean()*100
Output:
% Yes
Year Age
2017 30-40 50.000000
40-50 56.470588
50+ 44.871795
<30 44.086022
2018 30-40 62.162162
40-50 47.368421
50+ 42.682927
<30 45.205479
2019 30-40 52.564103
40-50 46.478873
50+ 47.959184
<30 46.153846
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.