简体   繁体   English

在两个数据框之间减去熊猫

[英]Pandas Subtracting between two Data Frames

DFOne 东风一号

 1. ID-1  NumberValueCol1- 10 
 2. ID-2  NumberValueCol1--11
 3. ID-3  NumberValueCol1--20
 4. ID-4  NumberValueCol1--13
 5. ID-5  NumberValueCol1--15

DFTwo 东风2

 1. ID-1  NumberValueCol1- 5
 2. ID-2  NumberValueCol1--7
 3. ID-3  NumberValueCol1--9
 4. ID-4  NumberValueCol1--6
 5. ID-5  NumberValueCol1--3

I need to subtract DFOne.NumberValueCol1 from each value in DFTwo until I get the least difference. 我需要从DFTwo中的每个值中减去DFOne.NumberValueCol1,直到得到最小的差异。

the first iteration would subtract DFOne.NumberValueCol1--10, from every value in DFTwo and that would result in 第一次迭代将从DFTwo中的每个值减去DFOne.NumberValueCol1--10,这将导致

ID Results (DFOne.NumberValueCol1, 10 value each DFTwo.NumberValueCol2 values) ID结果(DFOne.NumberValueCol1,每个10个值DFTwo.NumberValueCol2值)

 1. Result - 5
 2. Result - 3
 3. Result - 1
 4. Result - 4
 5. Result - 7

In this case, ID 3--DFTwo.NumberValueCol2 (9), yields the smallest difference of 1. So I would like to map this value to DFOne.NumberValueCol1 -- 10. 在这种情况下,ID 3--DFTwo.NumberValueCol2(9)产生的最小差异为1。因此,我想将此值映射到DFOne.NumberValueCol1-10。

The second iteration would start with ID 2, DFOne.NumberValueCol1 value 11. However, Instead of starting the subtraction from the beginning of DFTwo.NumberValueCol2, it would start at the next available ID from the point that there was a match. 第二次迭代将从ID 2开始,即DFOne.NumberValueCol1的值11。但是,与其从DFTwo.NumberValueCol2的开头开始减去,不如从匹配开始到下一个可用ID。 So, since there was a match with ID 3, the next starting point would be ID 4, and it would do the same as the first logic to get the smallest difference 因此,由于存在与ID 3的匹配,所以下一个起始点将是ID 4,它的作用与获得最小差异的第一个逻辑相同

I hope this is not too confusing. 我希望这不要太令人困惑。 I come from the t-sql world, so I'm trying to understand how to do this type of calculation using Pandas instead of the traditional sql server cursors. 我来自t-sql世界,所以我试图了解如何使用Pandas而不是传统的sql server游标进行这种类型的计算。

You problem is summarized as: 您的问题总结为:

  1. Find the maximum value in DFTwo, subtract that from the first value in DFOne. 在DFTwo中找到最大值,然后从DFOne中的第一个值中减去该最大值。
  2. Using the index of the maximum value in DFTwo, slice DFTwo onwards from that index. 使用DFTwo中最大值的索引,从该索引开始对DFTwo进行切片。
  3. Go to Step 1, using the second row of DFone. 使用DFone的第二行转到步骤1。

A working example: 一个工作示例:

import pandas as pd

df1 = {'id': [1,2,3,4,5], 'value': [10,11,20,13,15]}
df2 = {'id': [1,2,3,4,5], 'value': [5,7,9,6,3]}

df1 = pd.DataFrame(data=df1)
df2 = pd.DataFrame(data=df2)
print("DFTwo")
print(df2)
print('\n')
min_index = 0
df_output = []
for i in df1['value']:
    try:
        new_val = i - max(df2['value'])
        max_index = int(df2['id'][df2['value'] == max(df2['value'])].values)
        df2 = df2.iloc[max_index:,]
        df_output.append( (max_index, new_val) )
    except:
        break
print("Output")
print(pd.DataFrame(df_output, columns = ['id','result']))

However, we run into the issue here that DFTwo is eventually nil . 但是,我们在这里遇到了DFTwo最终为nil的问题

2 -- 1
   id  value
3   4      6
4   5      3
0 -- 5
   id  value
4   5      3
0 -- 17
Empty DataFrame
Columns: [id, value]
Index: []
Traceback (most recent call last):
  File "C:/Users/Tyler/Desktop/pd_test.py", line 11, in <module>
    new_val = i - max(df2['value'])
ValueError: max() arg is an empty sequence

The output with the new except clause: 带有新的except子句的输出:

DFTwo
   id  value
0   1      5
1   2      7
2   3      9
3   4      6
4   5      3


Output
   id  result
0   3       1
1   4       5

Ostensibly, this won't be an issue in your real-world use case, as DFTwo is large enough to support this slicing? 从表面上看,这在您的实际用例中不会成为问题,因为DFTwo足够大以支持此切片? Without more information on the actual business logic, this is my best attempt. 如果没有更多有关实际业务逻辑的信息,这是我的最佳尝试。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM