Lets say I have a dataframe like this
x = pd.DataFrame({'person':['a','b']*5 , 'rating':[1,3,4,2,4,2,3,4,5,3]})
Now I want to calculate for each person ,each rating's 'preference score' . Now I define preference score for rating r as
freq of rating where rating <=r - freq of rating where rating ==r
For example a has the following rating
0 a 1
2 a 4
4 a 4
6 a 3
8 a 5
now for example rating =4 for person a
freq of rating where rating <=4 : 4/5
freq of rating where rating ==4 : 2/5
so the preference score is 2/5
How do I do achieve the preference score for each record on that data frame . EDIT :Perhaps this makes it more clear
person rating pref_score
a 1 0.0
a 4 0.4
a 4 0.4
a 3 0.2
a 5 0.8
so you need something like this ?
x.groupby('person').rating.apply(lambda x : (sum(x<=4)-sum(x==4))/len(x))
Out[7]:
person
a 0.4
b 0.8
Name: rating, dtype: float64
Or transform
?
x.groupby('person').rating.transform(lambda x : (sum(x<=4)-sum(x==4))/len(x))
Out[8]:
0 0.4
1 0.8
2 0.4
3 0.8
4 0.4
5 0.8
6 0.4
7 0.8
8 0.4
9 0.8
Name: rating, dtype: float64
EDIT :
x=x.sort_values('person')
x['ref']=x.groupby('person').rating.apply(lambda y : [(sum(y<=x)-sum(y==x))/len(y) for x in y]).apply(pd.Series).stack().values
x
Out[25]:
person rating ref
0 a 1 0.0
2 a 4 0.4
4 a 4 0.4
6 a 3 0.2
8 a 5 0.8
1 b 3 0.4
3 b 2 0.0
5 b 2 0.0
7 b 4 0.8
9 b 3 0.4
Since you are using python 2.7
x['map']=x.person.map(x.groupby('person').rating.apply(list))
x.apply(lambda x : sum(x['rating']<np.array(x['map']))/len(x['map']),1 )
You can do the following:
>> x.groupby("person").rating.apply(lambda x: x[x <= 4].count())
person
a 4
b 5
and
>> x.groupby("person").rating.apply(lambda x: x[x == 4].count())
person
a 2
b 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.