简体   繁体   English

如何根据 python 中的多个标准连接两个数据帧?

[英]How can I connect two dataframes based on mutliple criteria in python?

I have the following two dataframes:我有以下两个数据框:

Shortlist:入围名单:

ticker     date     open    high    low     close   volume
ABC     2000-12-29  0.450   0.455   0.445   0.455   205843.0
ABC     2001-01-31  0.410   0.410   0.405   0.410   381500.0
ABC     2001-02-28  0.380   0.405   0.380   0.400   318384.0
...
ABC     2001-06-30  0.430   0.445   0.430   0.440   104016.0

MCap MCap

Code    EOM       mcRank    MktCap
ABC    29/12/2000   74     1563.967892
ABC    31/03/2001   98     998.156279
ABC    30/06/2001   59     2035.603350

I now want to create a new table that adds the columns of mcRank and MktCap from the MCap dataframe to the Shortlist dataframe, where the Code and the Date match.我现在想创建一个新表,将 MCap dataframe 中的 mcRank 和 MktCap 列添加到 Shortlist dataframe 中,其中代码和日期匹配。 If the date is shortlist is between the dates in MCap it should use the last known date.如果日期在 MCap 中的日期之间,则应使用最后一个已知日期。

The result should like this:结果应该是这样的:

ticker     date     open    high    low     close   volume    mcRank    MktCap
ABC     2000-12-29  0.450   0.455   0.445   0.455   205843.0   74     1563.967892
ABC     2001-01-31  0.410   0.410   0.405   0.410   381500.0   74     1563.967892   
ABC     2001-02-28  0.380   0.405   0.380   0.400   318384.0   74     1563.967892
...
ABC     2001-06-30  0.430   0.445   0.430   0.440   104016.0   59     2035.603350

I've tried pd.concat and pd.merge - but can't seem to get the right results.我已经尝试过 pd.concat 和 pd.merge - 但似乎无法获得正确的结果。

What you want to do is你想做的是

First align both date format, you can process it as string which makes it easier首先对齐两种日期格式,您可以将其处理为字符串,这样更容易

Second pd.merge them, use left_on, right_on and how='outer' to merge everything, and PURPOSELY create NA values第二个 pd.merge 它们,使用 left_on、right_on 和 how='outer' 合并所有内容,并故意创建 NA 值

Then you can use DataFrame.fillna(method='ffill') to fill na base on previous values然后你可以使用 DataFrame.fillna(method='ffill') 根据以前的值填充 na

Well this seems like a merge task, but first make sure EOM and date columns are actually the same variable dtype ( datetime ).好吧,这似乎是一个merge任务,但首先要确保EOMdate列实际上是相同的变量 dtype ( datetime )。

shortlist['date'] = pd.to_datetime(shortlist['date'], format='%Y-%m-%d')
MCap['EOM'] = pd.to_datetime(MCap['EOM'], format='%d/%m/%Y')

And then do the merge (this wont work if ticker or Codes are the indexes, if they are, reset the index first ie shortlist.rest_index(inplace=True) ):然后进行合并(如果tickerCodes是索引,这将不起作用,如果是,则首先重置索引,即shortlist.rest_index(inplace=True) ):

new_df = shortlist.merge(how='left', left_on=['ticker', 'date'], right_on=['Code', 'EOM']).reset_index()

IIUC,国际大学联合会,

you might have to break down the steps: first merge the two dataframes (i use the join function) on the dates, then fill in the null values with the oldest date from mcap (I am using the result of ur output as a guide):您可能必须分解这些步骤:首先在日期上合并两个数据帧(我使用连接函数),然后使用 mcap 中最旧的日期填写 null 值(我使用您的 output 的结果作为指南) :

convert to datetime and set index转换为日期时间并设置索引

df['date'] = pd.to_datetime(df['date'], format = '%Y-%m-%d')
df = df.set_index('date')
mcap['EOM'] = pd.to_datetime(mcap['EOM'])
mcap = mcap.set_index("EOM")

combine the dataframes:结合数据框:

res = df.join(mcap)

get the indices for the null rows:获取 null 行的索引:

indices = res[res.isna().any(axis=1)].index

get the values from mcap for the oldest date:从 mcap 获取最早日期的值:

latest_mcap = mcap.loc[mcap.index.min()].tolist()

assign latest_mcap to the null values in res:将 latest_mcap 分配给 res 中的 null 值:

res.loc[indices,['Code','mcRank','MktCap']] = latest_mcap

ticker  open    high    low close   volume  Code    mcRank  MktCap
date                                    
2000-12-29  ABC 0.45    0.455   0.445   0.455   205843.0    ABC 74.0    1563.967892
2001-01-31  ABC 0.41    0.410   0.405   0.410   381500.0    ABC 74.0    1563.967892
2001-02-28  ABC 0.38    0.405   0.380   0.400   318384.0    ABC 74.0    1563.967892
2001-06-30  ABC 0.43    0.445   0.430   0.440   104016.0    ABC 59.0    2035.603350

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM