简体   繁体   English

根据熊猫中的行匹配,用另一个DataFrame中的值有条件地填充列

[英]Conditionally fill column with value from another DataFrame based on row match in Pandas

I find myself lost trying to solve this problem (automating tax paperwork). 我发现自己迷失了解决这个问题的方法(自动完成税务文书工作)。 I have two dataframes: one with the quarterly historical records of EUR/USD exchange rates, and another with my own invoices, as an example: 我有两个数据框:例如,一个具有欧元/美元汇率的季度历史记录,另一个具有我自己的发票,例如:

import pandas as pd
import numpy as np

usdeur = [(pd.Timestamp('20170705'),1.1329),
          (pd.Timestamp('20170706'),1.1385),
          (pd.Timestamp('20170707'),1.1412),
          (pd.Timestamp('20170710'),1.1387),
          (pd.Timestamp('20170711'),1.1405),
          (pd.Timestamp('20170712'),1.1449)]
labels = ['Date', 'Rate']
rates = pd.DataFrame.from_records(usdeur, columns=labels)

transactions = [(pd.Timestamp('20170706'), 'PayPal',     'USD', 100, 1),
                (pd.Timestamp('20170706'), 'Fastspring', 'USD', 200, 1),
                (pd.Timestamp('20170709'), 'Fastspring', 'USD', 100, 1),
                (pd.Timestamp('20170710'), 'EU',         'EUR', 100, 1),
                (pd.Timestamp('20170710'), 'PayPal',     'USD', 200, 1)]
labels = ['Date', 'From', 'Currency', 'Amount', 'Rate']
sales =pd.DataFrame.from_records(transactions, columns=labels)

resulting in: 导致:

在此处输入图片说明

I would need to have the sales['Rate'] column filled with the proper exchange rates from the rates['Rate'] , that is to say: 我需要在sales['Rate']列中填充正确的汇率,这些汇率来自rates['Rate'] ,即:

  • if sales['Currency'] is 'EUR' , leave it alone. 如果sales['Currency']'EUR' ,请不要理会。
  • for each row of sales , find the row in rates with matching 'Date' ; 对于每一行sales ,请找到与'Date'匹配'Date' rates行; grab that very rates['Rate'] value and put it in sales['Rate'] 抓住非常高的rates['Rate']价值,然后将其放入sales['Rate']
  • bonus: if there's no matching 'Date' (eg during holidays, the exchange market is closed), check the previous row until a suitable value is found. 奖励:如果没有匹配的'Date' (例如,在假期期间,交易所市场关闭),请检查上一行直到找到合适的值。

The full result should look like the following (note that row #2 has the rate from 2017-07-07): 完整结果应如下所示(请注意,第2行的费率自2017年7月7日开始):

处理结果

I've tried to follow several suggested solutions from other questions, but with no luck. 我尝试从其他问题中遵循一些建议的解决方案,但是没有运气。 Thank you very much in advance 提前非常感谢你

You can change your rates dataframe to include all the dates and then forward fill,create a column called "Currency" in your Rates Dataframe and then join the two df's on both the date & currency columns. 您可以更改费率数据框以包括所有日期,然后向前填充,在费率数据框中创建一个名为“货币”的列,然后在日期和货币列上将两个df合并在一起。

idx = pd.DataFrame(pd.date_range('2017-07-05', '2017-07-12'),columns=['Date'])
rates = pd.merge(idx,rates,how="left",on="Date")
rates['Currency'] = 'USD'
rates['Rate'] = rates['Rate'].ffill()           

     Date   Rate    Currency
0   2017-07-05  1.1329  USD
1   2017-07-06  1.1385  USD
2   2017-07-07  1.1412  USD
3   2017-07-08  1.1412  USD
4   2017-07-09  1.1412  USD
5   2017-07-10  1.1387  USD
6   2017-07-11  1.1405  USD
7   2017-07-12  1.1449  USD

then doing a left join would give: 那么进行左联接将给出:

result = pd.merge(sales,rates,how="left",on=["Currency","Date"])
result['Rate'] = np.where(result['Currency'] == 'EUR', 1, result['Rate_y'])
result = result.drop(['Rate_x','Rate_y'],axis =1)

would give: 会给:

     Date         From      Currency    Amount  Rate
0   2017-07-06  PayPal           USD    100 1.1385
1   2017-07-06  Fastspring       USD    200 1.1385
2   2017-07-09  Fastspring       USD    100 1.1412
3   2017-07-10  EU               EUR    100 1.0000
4   2017-07-10  PayPal           USD    200 1.1387

I break down the steps , by using pd.merge_asof 我通过使用pd.merge_asof分解步骤

sales=pd.merge_asof(sales,rates,on='Date',direction='backward',allow_exact_matches =True)
sales.loc[sales.From=='EU','Rate_y']=sales.Rate_x

sales
Out[748]: 
        Date        From Currency  Amount  Rate_x  Rate_y
0 2017-07-06      PayPal      USD     100       1  1.1385
1 2017-07-06  Fastspring      USD     200       1  1.1385
2 2017-07-09  Fastspring      USD     100       1  1.1412
3 2017-07-10          EU      EUR     100       1  1.0000
4 2017-07-10      PayPal      USD     200       1  1.1387

Then 然后

sales.drop('Rate_x',1).rename(columns={'Rate_y':'Rate'})
Out[749]: 
        Date        From Currency  Amount    Rate
0 2017-07-06      PayPal      USD     100  1.1385
1 2017-07-06  Fastspring      USD     200  1.1385
2 2017-07-09  Fastspring      USD     100  1.1412
3 2017-07-10          EU      EUR     100  1.0000
4 2017-07-10      PayPal      USD     200  1.1387

Here is how I would do it without merge. 这是我不合并的情况。 1. Fill rates with missing dates and ffill as with other answers but keep Date as index. 1.用缺少的日期填写费率,并像其他答案一样填写,但将日期保留为索引。 2. Map this dataframe to sales, use loc to not include rows with EUR 2.将此数据框映射到销售,使用loc不包括带有EUR的行

idx = pd.date_range(rates['Date'].min(), rates['Date'].max())
rates = rates.set_index('Date').reindex(idx).ffill()
sales.loc[sales['Currency'] != 'EUR','Rate'] = sales.loc[sales['Currency'] != 'EUR','Date'].map(rates['Rate'])

    Date        From        Currency    Amount  Rate
0   2017-07-06  PayPal      USD         100     1.1385
1   2017-07-06  Fastspring  USD         200     1.1385
2   2017-07-09  Fastspring  USD         100     1.1412
3   2017-07-10  EU          EUR         100     1.0000
4   2017-07-10  PayPal      USD         200     1.1387

Or you can even do it without changing the dataframe rates 或者甚至可以在不更改数据帧速率的情况下进行操作

mapper = rates.set_index('Date').reindex(sales['Date'].unique()).ffill()['Rate']

sales.loc[sales['Currency'] != 'EUR','Rate'] = sales.loc[sales['Currency'] != 'EUR','Date'].map(mapper)

Timetesting: 时间测试:

wen:       0.011892538983374834
gayatri:   0.13312408898491412
vaishali : 0.009498710976913571

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据pandas中的另一个列值有条件地填充列值 - Conditionally fill column values based on another columns value in pandas 根据Pandas中第二列的条件,用另一行的同一列的值填充特定行的列中的值 - Fill values in a column of a particular row with the value of same column from another row based on a condition on second column in Pandas 使用 pandas 根据来自另一个 dataframe 的行值填充列值 - fill column values based on row values from another dataframe using pandas 如何从另一列及以上行中的值填充 pandas dataframe 中的 nan 值? - How to fill nan value in pandas dataframe from value in another column and above row? 根据熊猫数据框中另一列的最后一个值填充列 - Fill columns based on the last value of another column in a pandas dataframe 根据来自另一个数据帧的行中的范围添加/填充 Pandas 列 - Add/fill pandas column based on range in rows from another dataframe Label 基于另一列(同一行)的值的列 pandas dataframe - Label a column based on the value of another column (same row) in pandas dataframe 如何创建一个 pandas 系列(列),基于与另一个 Dataframe 中的值的匹配? - How to create a pandas Series (column), based in a match with a value in another Dataframe? 如何根据另一列中的单元格值有条件地填充 Pandas 列 - How to Conditionally Fill Pandas Column based on Cell Values in another column 使用基于 ID 列的另一行的值来估算 Pandas 数据框列 - Impute Pandas dataframe column with value from another row based on ID column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM