[英]probability of a categorical column in pandas dataframe
I have a pandas dataframe like this 我有一个像这样的熊猫数据框
0 Age color country
1 23 red Us
2 25 black UK
3 19 blue UK
4 10 red India
5 15 red UK
What I want to do is to find the probability of each category in 'color' column and have something like this: 我想做的是在“颜色”列中找到每个类别的概率,并进行如下操作:
0 Age color country color_pro
1 23 red Us 0.6
2 25 black UK 0.2
3 19 blue UK 0.2
4 10 red India 0.6
5 15 red UK 0.6
What should I do for finding probability in a tuple? 在元组中寻找概率应该怎么做? like this:
像这样:
0 color color_pro
1 red 0.6
2 black 0.2
3 blue 0.2
4 red 0.6
5 red 0.6
I want to have the probability in another tuple: 我想要另一个元组中的概率:
0 color_pro
1 0.6
2 0.2
3 0.2
4 0.6
5 0.6
Use groupby
and count
to get the values, then calculate proportion. 使用
groupby
进行count
以获得值,然后计算比例。
df['color_pro'] = df.groupby('color')['color'].transform('count')
df['color_pro'] = df['color_pro'].map(lambda x : x/len(df))
OR, clubbing both lines together, we can do this as well. 或者,将两条线合并在一起,我们也可以这样做。
df['color_pro'] = df.groupby('color')['color'].transform(lambda x : x.count()/len(df))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.