簡體   English   中英

按時合並熊貓數據框和另一列

[英]Merge pandas dataframe on time and another column

我有兩個熊貓數據框,試圖將它們組合成一個數據框。 這是我設置它們的方法:

a = {'date':['1/1/2015 00:00','1/1/2015 00:15','1/1/2015 00:30'], 'num':[1,2,3]}
b = {'date':['1/1/2015 01:15','1/1/2015 01:30','1/1/2015 01:45'], 'num':[4,5,6]}

dfa = pd.DataFrame(a)
dfb = pd.DataFrame(b)

dfa['date'] = dfa['date'].apply(pd.to_datetime)
dfb['date'] = dfb['date'].apply(pd.to_datetime)

然后,我從每個時間戳中找到earliestlatest時間戳,並創建一個僅以date序列開頭的新數據框:

earliest = min(dfa['date'].min(), dfb['date'].min())
latest = max(dfa['date'].max(), dfb['date'].max())

date_range = pd.date_range(earliest, latest, freq='15min')

dfd = pd.DataFrame({'date':date_range})

然后,我想將它們全部合並到一個以dfd為基礎的數據幀中,因為它將包含所有適當的時間戳。 所以我合並了dfddfa ,一切都很好:

dfd = pd.merge(dfd, dfa, how = 'outer', on = 'date')

但是,當我將其與dfb合並時, date序列變得很混亂,我不知道為什么。

dfd = pd.merge(dfd, dfb, how = 'outer', on = ['date','num'])

...收益率:

                  date  num
0  2015-01-01 00:00:00  1.0
1  2015-01-01 00:15:00  2.0
2  2015-01-01 00:30:00  3.0
3  2015-01-01 00:45:00  NaN
4  2015-01-01 01:00:00  NaN
5  2015-01-01 01:15:00  NaN
6  2015-01-01 01:30:00  NaN
7  2015-01-01 01:45:00  NaN
8  2015-01-01 01:15:00  4.0
9  2015-01-01 01:30:00  5.0
10 2015-01-01 01:45:00  6.0

我希望4.0可以填充2015-01-01 01:15:00時間段等,而不創建新行。

或者,如果我嘗試:

dfd = pd.merge(dfd, dfb, how = 'outer', on = 'date')

我得到:

                 date  num_x  num_y
0 2015-01-01 00:00:00    1.0    NaN
1 2015-01-01 00:15:00    2.0    NaN
2 2015-01-01 00:30:00    3.0    NaN
3 2015-01-01 00:45:00    NaN    NaN
4 2015-01-01 01:00:00    NaN    NaN
5 2015-01-01 01:15:00    NaN    4.0
6 2015-01-01 01:30:00    NaN    5.0
7 2015-01-01 01:45:00    NaN    6.0

這也不是我想要的(只想要一個num列)。 任何幫助,將不勝感激。

dfa.set_index('date').combine_first(dfb.set_index('date')) \
    .asfreq('15T').reset_index()

                 date    num
0 2015-01-01 00:00:00 1.0000
1 2015-01-01 00:15:00   2.00
2 2015-01-01 00:30:00   3.00
3 2015-01-01 00:45:00    nan
4 2015-01-01 01:00:00    nan
5 2015-01-01 01:15:00   4.00
6 2015-01-01 01:30:00   5.00
7 2015-01-01 01:45:00   6.00

另一個解決方案

dfa.append(dfb).set_index('date').asfreq('15T').reset_index()

首先合並dfa和dfb:

d = pd.merge(dfa, dfb, on=['date','num'], how='outer')

然后將結果與定義的dfd合並:

result = pd.merge(d, dfd, on='date', how='outer')
print result.sort('date')

輸出:

                 date  num
0 2015-01-01 00:00:00  1.0
1 2015-01-01 00:15:00  2.0
2 2015-01-01 00:30:00  3.0
6 2015-01-01 00:45:00  NaN
7 2015-01-01 01:00:00  NaN
3 2015-01-01 01:15:00  4.0
4 2015-01-01 01:30:00  5.0
5 2015-01-01 01:45:00  6.0

這有效:

a = {'date':['1/1/2015 00:00','1/1/2015 00:15','1/1/2015 00:30'], 'num':[1,2,3]}
b = {'date':['1/1/2015 01:15','1/1/2015 01:30','1/1/2015 01:45'], 'num':[4,5,6]}

dfa = pd.DataFrame(a)
dfb = pd.DataFrame(b)

dfa['date'] = dfa['date'].apply(pd.to_datetime)
dfb['date'] = dfb['date'].apply(pd.to_datetime)

earliest = min(dfa['date'].min(), dfb['date'].min())
latest = max(dfa['date'].max(), dfb['date'].max())

date_range = pd.date_range(earliest, latest, freq='15min')

dfd = pd.DataFrame({'date':date_range})


df_dates = pd.merge(dfa, dfb, how = 'outer')
df_final = pd.merge(dfd, df_dates, how = 'outer')

df_final

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM