[英]increase value by constant based on number of times a value occurs in another column
I have:我有:
df=pd.DataFrame({'col1':['x','x','x','x','x','y','y','y','y','y','y','y'],
'value':[0,0,0,0,0,0,0,0,0,0,0,0]})
I would like:我想要:
the value
column to increase by a constant value depending on the number of times it appears in col1
. value
列增加一个常数值,具体取决于它在col1
中出现的次数。 for each occurrence of x
, it increases by 100, and for each occurrence of y
it increases by 150对于
x
的每次出现,它增加 100,对于y
的每次出现,它增加 150
We'll start by getting the cumulative count for each item in col1
:我们将从获取
col1
中每个项目的累计计数开始:
df['value'] = df.groupby('col1').cumcount()
Next, we need to apply the multiplication by item:接下来,我们需要按项目应用乘法:
multiples = {
'x': 100,
'y': 150
}
for col, value in multiples.items():
index = df['col1'] == col
df.loc[index,'value'] *= value
Giving the final result:给出最终结果:
col1 value
0 x 0
1 x 100
2 x 200
3 x 300
4 x 400
5 y 0
6 y 150
7 y 300
8 y 450
9 y 600
10 y 750
11 y 900
EDIT : SNygard beat me to it, but I try to present a solution that makes use of pandas' broadcasting architecture and bypasses the inneficiencies of iteration.编辑: SNygard打败了我,但我尝试提出一种解决方案,该解决方案利用熊猫的广播架构并绕过迭代的低效性。
It is said that when you iterate over a dataframe's rows, you lose pandas' efficiency by using it to a purpose it was not intended for.据说当你遍历数据帧的行时,你会因为将它用于非预期目的而失去 pandas 的效率。
Here's how I would do it:我会这样做:
import pandas as pd
col1_to_value_hash = {
'x': 100,
'y': 150
}
df = pd.DataFrame({
'col1': ['x', 'x', 'x', 'x', 'x', 'y', 'y', 'y', 'y', 'y', 'y', 'y'],
'value': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
})
cumcount = df.groupby('col1').cumcount()
df['value'] = cumcount * df['col1'].apply(lambda x: col1_to_value_hash[x])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.