简体   繁体   English

通过另一列的分组值的总和对pandas数据框中的列进行归一化

[英]Normalize column in pandas dataframe by sum of grouped values of another column

I'm a bit stuck on trying to normalize some entries of a column in a pandas dataframe. 我在尝试规范化pandas数据框中的某些列条目时有些卡住。 So I have a dataframe like this: 所以我有一个像这样的数据框:

df = pd.DataFrame({
        'user':[0,0,1,1,1,2,2], 
        'item':['A','B', 'A', 'B','C','B','C'],
        'bought':[1,1,1,3,3,2,3]})
df
bought|item|user
----------------
1     |A   |0
1     |B   |0
1     |A   |1
3     |B   |1
3     |C   |1
2     |B   |2
3     |C   |2

I would like to get the number of each item bought normalized by the the total bought by each user. 我想将每个用户购买的总数量归一化的每个购买项目的数量。

In other words, for each entry of 'bought' I'd like to divide it by the sum of the total bought for that user (as another column). 换句话说,对于“已购买”的每个条目,我都希望将其除以该用户所购买的总和(作为另一列)。 In this case the output I'd like is this (but the 'normalized' column doesn't have to be fractions): 在这种情况下,我想要的输出是这样(但“归一化”列不必是分数):

bought|item|user|normalized
--------------------------
1     |A   |0   |1/2
1     |B   |0   |1/2
1     |A   |1   |1/7
3     |B   |1   |3/7
3     |C   |1   |3/7
2     |B   |2   |2/5
3     |C   |2   |3/5

So far I've grouped by user and gotten the sum by user: 到目前为止,我已经按用户分组并得到了用户的总和:

grouped = df.groupby(by='user')
grouped.aggregate(np.sum)

But at this point I'm stuck. 但是现在我被困住了。 Thanks! 谢谢!

pandas map pandas map

df.assign(normalized=df.bought.div(df.user.map(df.groupby('user').bought.sum())))

pandas transform pandas transform

df.assign(normalized=df.bought.div(df.groupby('user').bought.transform('sum')))

both yield 既屈服

   bought item  user  normalized
0       1    A     0    0.500000
1       1    B     0    0.500000
2       1    A     1    0.142857
3       3    B     1    0.428571
4       3    C     1    0.428571
5       2    B     2    0.400000
6       3    C     2    0.600000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM