[英]Update one row's column based on other rows, Pandas
i have a dataframe like below: 我有一个如下数据框:
Time col1 col2 col3
2 a x 10
3 b y 11
1 a x 10
6 c z 12
20 c x 13
23 a y 24
14 c x 13
16 b y 11
...
and want to add a column to every row of dataframe based on other rows of dataframe, this is out dataframe: 并想根据数据帧的其他行向数据帧的每一行添加一列,这是数据帧之外的内容:
Time col1 col2 col3 cumVal
2 a x 10 2
3 b y 11 1
1 a x 10 2
6 c z 12 1
20 c x 13 2
23 a y 24 1
14 c x 13 2
16 b y 11 1
...
i have a try : 我尝试一下:
df['cumVal'] = 0
for index, row in df.iterrows():
min1 = row['Time']-10
max1 = row['Time']+10
ndf = df[(df.col1 == row.col1)&(df.col2 == row.col2)& (df.col3 ==
row.col3)]
df.iloc[index]['cumVal'] = len(ndf.query('@min1 <= Time <= @max1'))
but it is very slow, anybody could change my code to get more faster? 但这很慢,有人可以更改我的代码以使其更快吗?
You can use groupby
on 'col1', 'col2' and 'col3' and in the transform
per group, use np.subtract
as a ufunc of outer
to calculate all the differences between values in the column 'Time' of this group, then with np.abs
inferior to 10 and np.sum
on axis=0, you can calculate how many values are within +/- 10 for each value. 您可以使用
groupby
上“COL1”,“COL2”和“COL3”,并在transform
每组使用np.subtract
作为的ufunc outer
,计算在这组列“时间”值之间的所有差异,然后如果np.abs
10,并且np.sum
在axis = 0上,则可以计算每个值在+/- 10范围内的值。
import numpy as np
df['cumVal'] = (df.groupby(['col1','col2','col3'])['Time']
.transform(lambda x: (np.abs(np.subtract.outer(x, x))<=10).sum(0)))
print (df)
Time col1 col2 col3 cumVal
0 2.0 a x 10.0 2.0
1 3.0 b y 11.0 1.0
2 1.0 a x 10.0 2.0
3 6.0 c z 12.0 1.0
4 20.0 c x 13.0 2.0
5 23.0 a y 24.0 1.0
6 14.0 c x 13.0 2.0
7 16.0 b y 11.0 1.0
It should give better performance: 它应具有更好的性能:
df['cumVal'] = 0
for index, row in df.iterrows():
min1 = row['Time']-10
max1 = row['Time']+10
ndf = df[(df.Time>min1)&(df.Time<max1)&(df.col1 == row.col1)&(df.col2 == row.col2)& (df.col3 ==
row.col3)]
df.iloc[index]['cumVal'] = len(ndf)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.