简体   繁体   English

根据值在另一列中出现的次数按常量增加值

[英]increase value by constant based on number of times a value occurs in another column

I have:我有:

df=pd.DataFrame({'col1':['x','x','x','x','x','y','y','y','y','y','y','y'],
                 'value':[0,0,0,0,0,0,0,0,0,0,0,0]})

在此处输入图像描述

I would like:我想要:

在此处输入图像描述

the value column to increase by a constant value depending on the number of times it appears in col1 . value列增加一个常数值,具体取决于它在col1中出现的次数。 for each occurrence of x , it increases by 100, and for each occurrence of y it increases by 150对于x的每次出现,它增加 100,对于y的每次出现,它增加 150

We'll start by getting the cumulative count for each item in col1 :我们将从获取col1中每个项目的累计计数开始:

df['value'] = df.groupby('col1').cumcount()

Next, we need to apply the multiplication by item:接下来,我们需要按项目应用乘法:

multiples = {
    'x': 100,
    'y': 150
}
for col, value in multiples.items():
    index = df['col1'] == col
    df.loc[index,'value'] *= value

Giving the final result:给出最终结果:

    col1    value
0   x   0
1   x   100
2   x   200
3   x   300
4   x   400
5   y   0
6   y   150
7   y   300
8   y   450
9   y   600
10  y   750
11  y   900

EDIT : SNygard beat me to it, but I try to present a solution that makes use of pandas' broadcasting architecture and bypasses the inneficiencies of iteration.编辑SNygard打败了我,但我尝试提出一种解决方案,该解决方案利用熊猫的广播架构并绕过迭代的低效性。


It is said that when you iterate over a dataframe's rows, you lose pandas' efficiency by using it to a purpose it was not intended for.据说当你遍历数据帧的行时,你会因为将它用于非预期目的而失去 pandas 的效率。

Here's how I would do it:我会这样做:

import pandas as pd

col1_to_value_hash = {
    'x': 100,
    'y': 150
}

df = pd.DataFrame({
    'col1': ['x', 'x', 'x', 'x', 'x', 'y', 'y', 'y', 'y', 'y', 'y', 'y'],
    'value': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
})

cumcount = df.groupby('col1').cumcount()


df['value'] = cumcount * df['col1'].apply(lambda x: col1_to_value_hash[x])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM