简体   繁体   English

熊猫数据框中分类列的概率

[英]probability of a categorical column in pandas dataframe

I have a pandas dataframe like this 我有一个像这样的熊猫数据框

0 Age color country
1  23  red    Us
2  25  black  UK
3  19  blue   UK
4  10  red    India
5  15  red    UK

What I want to do is to find the probability of each category in 'color' column and have something like this: 我想做的是在“颜色”列中找到每个类别的概率,并进行如下操作:

0 Age color country  color_pro
1  23  red    Us       0.6 
2  25  black  UK       0.2
3  19  blue   UK       0.2
4  10  red    India    0.6
5  15  red    UK       0.6

What should I do for finding probability in a tuple? 在元组中寻找概率应该怎么做? like this: 像这样:

0 color color_pro
1 red    0.6 
2 black  0.2
3 blue   0.2
4 red    0.6
5 red    0.6

I want to have the probability in another tuple: 我想要另一个元组中的概率:

0 color_pro
1  0.6 
2  0.2
3  0.2
4  0.6
5  0.6

Use groupby and count to get the values, then calculate proportion. 使用groupby进行count以获得值,然后计算比例。

df['color_pro'] = df.groupby('color')['color'].transform('count')
df['color_pro'] = df['color_pro'].map(lambda x : x/len(df))

OR, clubbing both lines together, we can do this as well. 或者,将两条线合并在一起,我们也可以这样做。

df['color_pro'] = df.groupby('color')['color'].transform(lambda x : x.count()/len(df))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM