简体   繁体   English

根据其他行(熊猫)更新一行的列

[英]Update one row's column based on other rows, Pandas

i have a dataframe like below: 我有一个如下数据框:

Time   col1  col2  col3 
2      a     x     10
3      b     y     11
1      a     x     10
6      c     z     12
20     c     x     13
23     a     y     24
14     c     x     13     
16     b     y     11
...

and want to add a column to every row of dataframe based on other rows of dataframe, this is out dataframe: 并想根据数据帧的其他行向数据帧的每一行添加一列,这是数据帧之外的内容:

Time   col1  col2  col3 cumVal
2      a     x     10   2
3      b     y     11   1
1      a     x     10   2
6      c     z     12   1
20     c     x     13   2
23     a     y     24   1
14     c     x     13   2
16     b     y     11   1
...

i have a try : 我尝试一下:

df['cumVal'] = 0
for index, row in df.iterrows():
   min1 = row['Time']-10
   max1 = row['Time']+10
   ndf = df[(df.col1 == row.col1)&(df.col2 == row.col2)& (df.col3 == 
   row.col3)]
   df.iloc[index]['cumVal'] = len(ndf.query('@min1 <= Time <= @max1'))

but it is very slow, anybody could change my code to get more faster? 但这很慢,有人可以更改我的代码以使其更快吗?

You can use groupby on 'col1', 'col2' and 'col3' and in the transform per group, use np.subtract as a ufunc of outer to calculate all the differences between values in the column 'Time' of this group, then with np.abs inferior to 10 and np.sum on axis=0, you can calculate how many values are within +/- 10 for each value. 您可以使用groupby上“COL1”,“COL2”和“COL3”,并在transform每组使用np.subtract作为的ufunc outer ,计算在这组列“时间”值之间的所有差异,然后如果np.abs 10,并且np.sum在axis = 0上,则可以计算每个值在+/- 10范围内的值。

import numpy as np
df['cumVal'] = (df.groupby(['col1','col2','col3'])['Time']
                  .transform(lambda x: (np.abs(np.subtract.outer(x, x))<=10).sum(0)))
print (df)
   Time col1 col2  col3  cumVal
0   2.0    a    x  10.0     2.0
1   3.0    b    y  11.0     1.0
2   1.0    a    x  10.0     2.0
3   6.0    c    z  12.0     1.0
4  20.0    c    x  13.0     2.0
5  23.0    a    y  24.0     1.0
6  14.0    c    x  13.0     2.0
7  16.0    b    y  11.0     1.0

It should give better performance: 它应具有更好的性能:

df['cumVal'] = 0
for index, row in df.iterrows():
   min1 = row['Time']-10
   max1 = row['Time']+10
   ndf = df[(df.Time>min1)&(df.Time<max1)&(df.col1 == row.col1)&(df.col2 == row.col2)& (df.col3 == 
   row.col3)]
   df.iloc[index]['cumVal'] = len(ndf)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何以有效的方式根据 Pandas 中的其他行更新一行 - How to update a row based on other rows in pandas in an efficient way pandas dataframe根据相应行的其他列更新列值 - pandas dataframe update column values based on other columns of the corresponding row 在 pandas 的同一列中将值从一行拆分到其他行 - Splitting value from one row to other rows in the same column in pandas 根据 pandas 中的列值将多行合并为一行 - Combine multiple rows into one row based on Column values in pandas Python Pandas:根据同一列的上一行和其他列的条件填充列的元素 - Python Pandas: Fill column's element based on same column's previous row and other column's condition Pandas:根据其他行的值删除行 - Pandas: delete rows based on the value of other row 在Pandas中,如何根据另一行中的另一列值更新一行中的列值 - In Pandas how to update column value in one row based on another column value in another row pandas:根据其他列将多行中一个单元格的值替换为一个特定行 - pandas: replace one cell's value from mutiple row by one particular row based on other columns 如何根据行级别上某些元素与其他列名称的相似性更新熊猫列单元格值 - How to update pandas column cell values based on similarity of some elements with other column names on row level Pandas 基于另一列的条件行值 - Pandas conditional row values based on an other column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM