简体   繁体   中英

calculate percentage from a dataframe which has same id and multiple values in 'value' column

I have a dataframe that has 45 unique values and corresponding are other values like 'bread', 'slice', jelly, and powder.

Here is what I have made up as the dataset:

Value_ID     Value
1000         bread
1000         bread
1000         bread
1000         bread
1000         jelly
1000         bread
1001         powder
1001         bread
1001         bread
1001         bread
1001         bread
1002         slice 
1002         powder
1002         bread
1002         jelly

Here, from the data I am trying to get the number(count) of Value_ID where the value-ID contains more than or equal to 80% bread, which in this case is 2 and value_id is 1001 and 1002.

You can use grouby.mean on the boolean Series to get the proportion on bread, then filter:

(df['Value'].eq('bread')
 .groupby(df['Value_ID']).mean()
 .loc[lambda x: x>=0.8]
 .index.to_list()
)

output: [1000, 1001]

Intermediate:

(df['Value'].eq('bread')
 .groupby(df['Value_ID']).mean()
)

output:

Value_ID
1000    0.833333
1001    0.800000
1002    0.250000
Name: Value, dtype: float64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM