简体   繁体   中英

How do I calculate the percentage of a particular response in groupby / pivot table?

Trying to understand groupby and pivot_table better.

I have a dataframe like this:

df = pd.DataFrame({"Year": np.random.choice([2017,2018,2019], 1000), 
                   "Age":np.random.choice(['<30','30-40','40-50','50+'], 1000), 
                   "Pref":np.random.choice(['Yes','No'], 1000)})

How do I group the data by ['Year', 'Age'] such that the resulting column tells me what percentage of 'Pref' were 'Yes' for that particular age group that year: what percentage of under 30s responded yes in 2017? etc..

I'd like to produce a transformed dataframe something like:

                  % Yes
Year    Age
2017    <30       45 
        30-40     52
        40-50     58
        50+       44 
2018    <30       56 
        30-40     53
        40-50     50
        50+       44 
2019    <30       40 
        30-40     38
        40-50     51
        50+       53 

How can I do that?

df.groupby(['Year', 'Age']).agg(lambda x: 100 * sum(i == 'Yes' for i in x) / len(x))

Try this:

(df['Pref'] == 'Yes').rename('% Yes').groupby([df['Year'], df['Age']]).mean()*100

Output:

                % Yes
Year Age             
2017 30-40  50.000000
     40-50  56.470588
     50+    44.871795
     <30    44.086022
2018 30-40  62.162162
     40-50  47.368421
     50+    42.682927
     <30    45.205479
2019 30-40  52.564103
     40-50  46.478873
     50+    47.959184
     <30    46.153846

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM