[英]Find data from row in Pandas DataFrame based upon calculated value?
As an extension of my previous question , I would like take a DataFrame like the one below and find the correct row from which to pull data from column C
and place it into column D
based upon the following criteria: 作为我上一个问题的扩展,我想采用下面的DataFrame,并根据以下条件从
C
列中提取数据并将其放入D
列中找到正确的行:
B_new = 2*A_old -B_old
, ie. B_new = 2*A_old -B_old
,即 the new row needs to have a B
equal to the following result from the old row: 2*A - B
. B
等于旧行的以下结果: 2*A - B
。 A
is the same, ie. A
相同,即。 A
in the new row should have the same value as the old row. A
应该具有与旧行相同的值。 NaN
result NaN
结果 Code: 码:
import pandas as pd
a = [2,2,2,3,3,3,3]
b = [1,2,3,1,3,4,5]
c = [0,1,2,3,4,5,6]
df = pd.DataFrame({'A': a , 'B': b, 'C':c})
print(df)
A B C
0 2 1 0
1 2 2 1
2 2 3 2
3 3 1 3
4 3 3 4
5 3 4 5
6 3 5 6
Desired output: 所需的输出:
A B C D
0 2 1 0 2.0
1 2 2 1 1.0
2 2 3 2 0.0
3 3 1 3 6.0
4 3 3 4 4.0
5 3 4 5 NaN
6 3 5 6 3.0
Based upon the solutions in my previous question , I've come up with a method that uses a for loop to move thru each unique value of A
: 根据上一个问题中的解决方案, 我提出了一种使用for循环将
A
每个唯一值移动的方法:
for i in df.A.unique():
mapping = dict(df[df.A==i][['B', 'C']].values)
df.loc[df.A==i,'D'] = (2 * df[df.A==i]['A'] - df[df.A==i]['B']).map(mapping)
However, this seem clunky and I suspect there is a better way that doesn't make use of for loops, which from my prior experience tend to be slow. 但是,这似乎很笨拙,我怀疑还有一种更好的方法不使用for循环,根据我以前的经验,这种循环往往很慢。
Question: What's the fastest way to accomplish this transfer of data within the DataFrame? 问题:在DataFrame中完成此数据传输的最快方法是什么?
You could 你可以
In [370]: (df[['A', 'C']].assign(B=2*df.A - df.B)
.merge(df, how='left', on=['A', 'B'])
.assign(B=df.B)
.rename(columns={'C_x': 'C', 'C_y': 'D'}) )
Out[370]:
A C B D
0 2 0 1 2.0
1 2 1 2 1.0
2 2 2 3 0.0
3 3 3 1 6.0
4 3 4 3 4.0
5 3 5 4 NaN
6 3 6 5 3.0
Details: 细节:
In [372]: df[['A', 'C']].assign(B=2*df.A - df.B)
Out[372]:
A C B
0 2 0 3
1 2 1 2
2 2 2 1
3 3 3 5
4 3 4 3
5 3 5 2
6 3 6 1
In [373]: df[['A', 'C']].assign(B=2*df.A - df.B).merge(df, how='left', on=['A', 'B'])
Out[373]:
A C_x B C_y
0 2 0 3 2.0
1 2 1 2 1.0
2 2 2 1 0.0
3 3 3 5 6.0
4 3 4 3 4.0
5 3 5 2 NaN
6 3 6 1 3.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.