简体   繁体   English

根据计算值从Pandas DataFrame中的行中查找数据?

[英]Find data from row in Pandas DataFrame based upon calculated value?

As an extension of my previous question , I would like take a DataFrame like the one below and find the correct row from which to pull data from column C and place it into column D based upon the following criteria: 作为我上一个问题的扩展,我想采用下面的DataFrame,并根据以下条件从C列中提取数据并将其放入D列中找到正确的行:

  1. B_new = 2*A_old -B_old , ie. B_new = 2*A_old -B_old ,即 the new row needs to have a B equal to the following result from the old row: 2*A - B . 新行的B等于旧行的以下结果: 2*A - B
  2. Where A is the same, ie. 其中A相同,即。 A in the new row should have the same value as the old row. 新行中的A应该具有与旧行相同的值。
  3. Any values not found should use a NaN result 找不到任何值应使用NaN结果

Code: 码:

import pandas as pd
a = [2,2,2,3,3,3,3]
b = [1,2,3,1,3,4,5]
c = [0,1,2,3,4,5,6]

df = pd.DataFrame({'A': a , 'B': b, 'C':c})
print(df)

   A  B  C
0  2  1  0
1  2  2  1
2  2  3  2
3  3  1  3
4  3  3  4
5  3  4  5
6  3  5  6

Desired output: 所需的输出:

   A  B  C    D
0  2  1  0  2.0
1  2  2  1  1.0
2  2  3  2  0.0
3  3  1  3  6.0
4  3  3  4  4.0
5  3  4  5  NaN
6  3  5  6  3.0

Based upon the solutions in my previous question , I've come up with a method that uses a for loop to move thru each unique value of A : 根据上一个问题中的解决方案, 我提出了一种使用for循环将A每个唯一值移动的方法:

for i in df.A.unique():
    mapping = dict(df[df.A==i][['B', 'C']].values)
    df.loc[df.A==i,'D'] = (2 * df[df.A==i]['A'] - df[df.A==i]['B']).map(mapping)

However, this seem clunky and I suspect there is a better way that doesn't make use of for loops, which from my prior experience tend to be slow. 但是,这似乎很笨拙,我怀疑还有一种更好的方法不使用for循环,根据我以前的经验,这种循环往往很慢。

Question: What's the fastest way to accomplish this transfer of data within the DataFrame? 问题:在DataFrame中完成此数据传输的最快方法是什么?

You could 你可以

In [370]: (df[['A', 'C']].assign(B=2*df.A - df.B)
           .merge(df, how='left', on=['A', 'B'])
           .assign(B=df.B)
           .rename(columns={'C_x': 'C', 'C_y': 'D'}) )
Out[370]:
   A  C  B    D
0  2  0  1  2.0
1  2  1  2  1.0
2  2  2  3  0.0
3  3  3  1  6.0
4  3  4  3  4.0
5  3  5  4  NaN
6  3  6  5  3.0

Details: 细节:

In [372]: df[['A', 'C']].assign(B=2*df.A - df.B)
Out[372]:
   A  C  B
0  2  0  3
1  2  1  2
2  2  2  1
3  3  3  5
4  3  4  3
5  3  5  2
6  3  6  1

In [373]: df[['A', 'C']].assign(B=2*df.A - df.B).merge(df, how='left', on=['A', 'B'])
Out[373]:
   A  C_x  B  C_y
0  2    0  3  2.0
1  2    1  2  1.0
2  2    2  1  0.0
3  3    3  5  6.0
4  3    4  3  4.0
5  3    5  2  NaN
6  3    6  1  3.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据列数据计算从Pandas DataFrame中的另一行查找数据的最快方法? - Fastest method of finding data from another row in Pandas DataFrame based upon column data calculation? 如何根据行值从熊猫数据框中删除一行 - How to remove a row from pandas dataframe based on row value 当前一个值也使用组数据计算时,如何在 pandas dataframe 中使用前一行值 - How to use a previous row value in a pandas dataframe when the previous value is also calculated witht group data 在熊猫中根据日期时间查找最近值 - Find Nearest Value based upon datetime in pandas 熊猫-根据先前计算的行值计算行值 - Pandas - calculate row value based on previous calculated row value Pandas DataFrame:添加具有基于前一行计算值的新列 - Pandas DataFrame: Add new column with calculated values based on previous row 根据行和列从pandas数据框中选择数据值以追加到列表 - Selecting a data value from pandas dataframe based on row and column to append to list 根据行值将行插入到熊猫数据框中? - Inserting a row into a pandas dataframe based on row value? 将计算的行添加到pandas DataFrame - Add calculated row to pandas DataFrame 根据熊猫列中的值从DataFrame中选择特定的行 - Select particular row from a DataFrame based on value in a column in pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM