[英]Pandas join/merge 2 dataframes using date as index
I have 2 big dataframes with date as index. 我有2个大数据帧,日期为索引。 To simplyfy an example let's say that they look like this (number of data in particular date in the first dataframe is not the same as in second):
简单地说一个例子,假设它们看起来像这样(第一个数据帧中特定日期的数据数与第二个数据帧中的数据不同):
df1: DF1:
Date X Y
2000-01-01 x1 y1
2000-01-01 x2 y2
2000-01-02 x3 y3
2000-01-03 x4 y4
2000-01-03 x5 y5
2000-01-03 x6 y6
df2: DF2:
Date X_2 Y_2
2000-01-01 X1 Y1
2000-01-01 X2 Y2
2000-01-01 X3 Y3
2000-01-03 X4 Y4
2000-01-03 X5 Y5
The output should look like this (I want merge only data with dates whitch appear in both dataframes): 输出应该如下所示(我想只合并两个数据帧中出现日期的数据):
Date X Y X_2 Y_2
2000-01-01 x1 y1 X1 Y1
2000-01-01 x2 y2 X2 Y2
2000-01-01 NaN NaN X3 Y3
2000-01-03 x4 y4 X4 Y4
2000-01-03 x5 y5 X5 Y5
2000-01-03 x6 y6 NaN NaN
I've tried different code combinations and I keep getting duplicated data like this: 我尝试了不同的代码组合,并且我不断得到像这样的重复数据:
Date X Y X_2 Y_2
2000-01-01 x1 y1 X1 Y1
2000-01-01 x1 y1 X2 Y2
2000-01-01 x1 y1 X3 Y3
2000-01-01 x2 y2 X1 Y1
2000-01-01 x2 y2 X2 Y2
2000-01-01 x2 y2 X3 Y3
I've tried eg result = pd.merge(df1,df2, how='inner', on='Date')
What to do in order to get the result I want? 我试过例如
result = pd.merge(df1,df2, how='inner', on='Date')
怎样做才能得到我想要的结果?
Use cumcount
to number the items in each group, when grouped by Date
: 按
Date
分组时,使用cumcount
对每个组中的项目进行编号:
In [107]: df1['count'] = df1.groupby('Date').cumcount()
In [108]: df1
Out[108]:
Date X Y count
0 2000-01-01 x1 y1 0
1 2000-01-01 x2 y2 1
2 2000-01-02 x3 y3 0
3 2000-01-03 x4 y4 0
4 2000-01-03 x5 y5 1
5 2000-01-03 x6 y6 2
In [109]: df2['count'] = df2.groupby('Date').cumcount()
In [110]: df2
Out[110]:
Date X_2 Y_2 count
0 2000-01-01 X1 Y1 0
1 2000-01-01 X2 Y2 1
2 2000-01-01 X3 Y3 2
3 2000-01-03 X4 Y4 0
4 2000-01-03 X5 Y5 1
By adding the count
column, you can now merge on both Date
and count
which gets you close to the result you want: 通过添加
count
列,您现在可以合并Date
和count
,使您接近所需的结果:
In [111]: pd.merge(df1, df2, on=['Date', 'count'], how='outer')
Out[111]:
Date X Y count X_2 Y_2
0 2000-01-01 x1 y1 0 X1 Y1
1 2000-01-01 x2 y2 1 X2 Y2
2 2000-01-02 x3 y3 0 NaN NaN
3 2000-01-03 x4 y4 0 X4 Y4
4 2000-01-03 x5 y5 1 X5 Y5
5 2000-01-03 x6 y6 2 NaN NaN
6 2000-01-01 NaN NaN 2 X3 Y3
The rows that you wish to remove can be characterized as those where count equals 0 and X or X_2 equals NaN. 您要删除的行可以表示为count等于0且X或X_2等于NaN的行。 Therefore, you could remove those rows with a boolean mask like this:
因此,您可以使用布尔掩码删除这些行,如下所示:
mask = (result['count'] == 0) & pd.isnull(result).any(axis=1)
result = result.loc[~mask]
import pandas as pd
df1 = pd.DataFrame({'Date': ['2000-01-01',
'2000-01-01',
'2000-01-02',
'2000-01-03',
'2000-01-03',
'2000-01-03'],
'X': ['x1', 'x2', 'x3', 'x4', 'x5', 'x6'],
'Y': ['y1', 'y2', 'y3', 'y4', 'y5', 'y6']})
df2 = pd.DataFrame({'Date': ['2000-01-01',
'2000-01-01',
'2000-01-01',
'2000-01-03',
'2000-01-03'],
'X_2': ['X1', 'X2', 'X3', 'X4', 'X5'],
'Y_2': ['Y1', 'Y2', 'Y3', 'Y4', 'Y5']})
df1['count'] = df1.groupby('Date').cumcount()
df2['count'] = df2.groupby('Date').cumcount()
result = pd.merge(df1, df2, on=['Date', 'count'], how='outer')
mask = (result['count'] == 0) & pd.isnull(result).any(axis=1)
result = result.loc[~mask]
result = result.drop('count', axis=1)
yields 产量
Date X Y count X_2 Y_2
0 2000-01-01 x1 y1 0 X1 Y1
1 2000-01-01 x2 y2 1 X2 Y2
3 2000-01-03 x4 y4 0 X4 Y4
4 2000-01-03 x5 y5 1 X5 Y5
5 2000-01-03 x6 y6 2 NaN NaN
6 2000-01-01 NaN NaN 2 X3 Y3
Another way to restrict the merge to only those dates which are common to both df1
and df2
would be find the intersection of df1['Date']
and df2['Date']
first, and then apply pd.merge
to sub-DataFrames of df1
and df2
which contain only those dates: 将合并限制为仅
df1
和df2
共同的那些日期的另一种方法是首先找到df1['Date']
和df2['Date']
pd.merge
,然后将pd.merge
应用于子数据框架df1
和df2
只包含那些日期:
import numpy as np
dates = np.intersect1d(df1['Date'], df2['Date'])
mask1 = df1['Date'].isin(dates)
mask2 = df2['Date'].isin(dates)
result = pd.merge(df1.loc[mask1], df2.loc[mask2], on=['Date', 'count'], how='outer')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.