[英]How to calculate the percentage of each value in a column follow each category in python pandas dataframe
[英]Pandas - How to calculate for each group ,for each value in a column what percentage of value is equal and less than that
可以说我有一个这样的数据框
x = pd.DataFrame({'person':['a','b']*5 , 'rating':[1,3,4,2,4,2,3,4,5,3]})
现在,我想为每个人计算每个等级的“偏好分数”。 现在,我将评级r的偏好得分定义为
freq of rating where rating <=r - freq of rating where rating ==r
例如,a具有以下等级
0 a 1
2 a 4
4 a 4
6 a 3
8 a 5
现在,例如,某人a
评级= 4
freq of rating where rating <=4 : 4/5
freq of rating where rating ==4 : 2/5
所以偏好分数是2/5
如何为该数据帧上的每个记录获取优先级分数。 编辑:也许这使它更清晰
person rating pref_score
a 1 0.0
a 4 0.4
a 4 0.4
a 3 0.2
a 5 0.8
所以你需要这样的东西吗?
x.groupby('person').rating.apply(lambda x : (sum(x<=4)-sum(x==4))/len(x))
Out[7]:
person
a 0.4
b 0.8
Name: rating, dtype: float64
还是transform
?
x.groupby('person').rating.transform(lambda x : (sum(x<=4)-sum(x==4))/len(x))
Out[8]:
0 0.4
1 0.8
2 0.4
3 0.8
4 0.4
5 0.8
6 0.4
7 0.8
8 0.4
9 0.8
Name: rating, dtype: float64
编辑:
x=x.sort_values('person')
x['ref']=x.groupby('person').rating.apply(lambda y : [(sum(y<=x)-sum(y==x))/len(y) for x in y]).apply(pd.Series).stack().values
x
Out[25]:
person rating ref
0 a 1 0.0
2 a 4 0.4
4 a 4 0.4
6 a 3 0.2
8 a 5 0.8
1 b 3 0.4
3 b 2 0.0
5 b 2 0.0
7 b 4 0.8
9 b 3 0.4
由于您使用的是python 2.7
x['map']=x.person.map(x.groupby('person').rating.apply(list))
x.apply(lambda x : sum(x['rating']<np.array(x['map']))/len(x['map']),1 )
您可以执行以下操作:
>> x.groupby("person").rating.apply(lambda x: x[x <= 4].count())
person
a 4
b 5
和
>> x.groupby("person").rating.apply(lambda x: x[x == 4].count())
person
a 2
b 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.