[英]How to merge two datasets based on conditions
我正在嘗試根據 3 個條件合並 python 中的兩個數據集。 它們必須具有相同的經度、緯度和特定年份的月份。 一個數據集的大小約為 16k,另一個為 1.7k。 輸入和預期 output 的簡單示例如下:
>df1
long lat date proximity
5 8 23/06/2009 Near
6 10 05/10/2012 Far
8 6 19/02/2010 Near
3 4 30/04/2014 Near
5 8 01/06/2009 Far
>df2
long lat date mine
5 8 10/06/2009 1
8 6 24/02/2010 0
7 2 19/04/2014 1
3 4 30/04/2013 1
如果任何條件為假,則合並時“我的”中的值為 0。我將如何合並以獲得:
long lat date proximity mine
5 8 23/06/2009 Near 1
6 10 05/10/2012 Far 0
8 6 19/02/2010 Near 0
3 4 30/04/2014 Near 0
5 8 01/06/2009 Far 1
如果這樣更容易,則 output 中不需要日期列。
這里是 go:
df1['year-month'] = pd.to_datetime(df1['date'], format='%d/%m/%Y').dt.strftime('%Y/%m')
df2['year-month'] = pd.to_datetime(df2['date'], format='%d/%m/%Y').dt.strftime('%Y/%m')
joined = df1.merge(df2,
how='left',
on =['long', 'lat', 'year-month'],
suffixes=['', '_r']).drop(columns = ['date_r', 'year-month'])
joined['mine'] = joined['mine'].fillna(0).astype(int)
print(joined)
Output
long lat date proximity mine
0 5 8 23/06/2009 Near 1
1 6 10 05/10/2012 Far 0
2 8 6 19/02/2010 Near 0
3 3 4 30/04/2014 Near 0
4 5 8 01/06/2009 Far 1
首先從date
列中提取month
和year
並將其分配給臨時列mon-year
,然后使用DataFrame.merge
左合並long, lat and mon-year
上的數據幀df1
、 df2
,然后使用Series.fillna
填充在mine
列中的NaN
值為0
,最后使用DataFrame.drop
刪除臨時列mon-year
:
df1['mon-year'] = df1['date'].str.extract(r'/(.*)')
df2['mon-year'] = df2['date'].str.extract(r'/(.*)')
# OR we can use pd.to_datetime,
# df1['mon-year'] = pd.to_datetime(df1['date'], format='%d/%m/%Y').dt.strftime('%m-%Y')
# df2['mon-year'] = pd.to_datetime(df2['date'], format='%d/%m/%Y').dt.strftime('%m-%Y')
df3 = df1.merge(
df2.drop('date', 1),
on=['long', 'lat', 'mon-year'], how='left').drop('mon-year', 1)
df3['mine'] = df3['mine'].fillna(0)
結果:
# print(df3)
long lat date proximity mine
0 5 8 23/06/2009 Near 1.0
1 6 10 05/10/2012 Far 0.0
2 8 6 19/02/2010 Near 0.0
3 3 4 30/04/2014 Near 0.0
4 5 8 01/06/2009 Far 1.0
您可以使用多個鍵進行合並,如下所示:
df_1.merge(df_2, how='left', left_on=['long', 'lat', 'date'], right_on=['long', 'lat', 'date'])
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.