繁体   English   中英

Python中Pandas DataFrame的条件合并

[英]Conditional merging of Pandas DataFrames in Python

我有2个DataFrame,目前看起来像这样:

raw_data = {'SeriesDate':['2017-03-10','2017-03-13','2017-03-14','2017-03-15','2017-03-16','2017-03-17']}
import pandas as pd
df1 = pd.DataFrame(raw_data,columns=['SeriesDate'])
df1['SeriesDate'] = pd.to_datetime(df['SeriesDate'])
print df1

 SeriesDate
0 2017-03-10
1 2017-03-13
2 2017-03-14
3 2017-03-15
4 2017-03-16
5 2017-03-17

raw_data2 = {'SeriesDate':['2017-03-10','2017-03-13','2017-03-14','2017-03-15','2017-03-16'],'NewSeriesDate':['2017-03-11','2017-03-12','2017-03-13','2017-03-14','2017-03-14']}
df2 = pd.DataFrame(raw_data2,columns=['SeriesDate','NewSeriesDate'])
df2['SeriesDate'] = pd.to_datetime(df['SeriesDate'])
print df2

SeriesDate NewSeriesDate
0 2017-03-10    2017-03-11
1 2017-03-13    2017-03-12
2 2017-03-14    2017-03-13
3 2017-03-15    2017-03-14
4 2017-03-16    2017-03-14

1)我想以这样的方式加入数据帧:对于3月15日之前df1中的所有“ SeriesDate”,“ NewSeriesDate”值应取自df2。

2)对于3月15日之后df1中的任何“ SeriesDate”或不存在于df2中的任何“ SeriesDate”,“ NewSeriesDate”的计算应如下:

from pandas.tseries.offsets import BDay
df1['NewSeriesDate'] = df1[''SeriesDate'] - BDay(1)

例如,在这种情况下,我最终的DataFrame如下所示:

raw_data3 = {'SeriesDate':['2017-03-10','2017-03-13','2017-03-14','2017-03-15','2017-03-16','2017-03-17'],'NewSeriesDate':['2017-03-11','2017-03-12','2017-03-13','2017-03-14','2017-03-15','2017-03-16']}
finaldf = pd.DataFrame(raw_data3,columns=['SeriesDate','NewSeriesDate'])
finaldf['SeriesDate'] = pd.to_datetime(df['SeriesDate'])
print finaldf

 SeriesDate NewSeriesDate
0 2017-03-10    2017-03-11
1 2017-03-13    2017-03-12
2 2017-03-14    2017-03-13
3 2017-03-15    2017-03-14
4 2017-03-16    2017-03-15
5 2017-03-17    2017-03-16

我是Pandas的新手,所以不确定如何应用条件合并,任何人都可以提供一些见解吗?

试试看 可能会更清洁一点,但是可以解决问题。 您没有指定日期恰好是3月15日会发生什么,所以我做了一个假设。 我可能已经切换了一些标题名称,但是您明白了:

import pandas as pd
from pandas.tseries.offsets import BDay
import numpy as np

df1 = pd.DataFrame({
    'SeriesDate':pd.to_datetime(['3/10/17','3/13/17','3/14/17','3/15/17','3/16/17','3/17/17']),
    })

df1['NewSeries'] = np.nan

df2 = pd.DataFrame({
    'SeriesDate':pd.to_datetime(['3/10/17','3/13/17','3/14/17','3/15/17','3/16/17']),
    'NewSeries':pd.to_datetime(['3/11/17','3/12/17','3/13/17','3/14/17','3/14/17'])
    })

d = pd.to_datetime('3/15/17')

df1.loc[df1['SeriesDate'] <= d]  = df1.loc[df1['SeriesDate'] <= d].set_index('SeriesDate') \
    .combine_first(df2.loc[df2['SeriesDate'] <= d].set_index('SeriesDate')).reset_index()

df1.loc[df1['SeriesDate'] > d, 'NewSeries'] = df1['SeriesDate'] - BDay(1)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM