簡體   English   中英

如何根據條件合並兩個數據集

[英]How to merge two datasets based on conditions

我正在嘗試根據 3 個條件合並 python 中的兩個數據集。 它們必須具有相同的經度、緯度和特定年份的月份。 一個數據集的大小約為 16k,另一個為 1.7k。 輸入和預期 output 的簡單示例如下:

>df1
 long  lat  date        proximity
 5      8   23/06/2009    Near
 6      10  05/10/2012    Far
 8      6   19/02/2010    Near
 3      4   30/04/2014    Near
 5      8   01/06/2009    Far

 >df2
 long  lat  date          mine
 5      8   10/06/2009     1
 8      6   24/02/2010     0
 7      2   19/04/2014     1 
 3      4   30/04/2013     1

如果任何條件為假,則合並時“我的”中的值為 0。我將如何合並以獲得:

 long  lat  date        proximity  mine
 5      8   23/06/2009    Near      1
 6      10  05/10/2012    Far       0
 8      6   19/02/2010    Near      0
 3      4   30/04/2014    Near      0
 5      8   01/06/2009    Far       1

如果這樣更容易,則 output 中不需要日期列。

這里是 go:

df1['year-month'] = pd.to_datetime(df1['date'], format='%d/%m/%Y').dt.strftime('%Y/%m')
df2['year-month'] = pd.to_datetime(df2['date'], format='%d/%m/%Y').dt.strftime('%Y/%m')

joined = df1.merge(df2,
          how='left',
          on =['long', 'lat', 'year-month'],
          suffixes=['', '_r']).drop(columns = ['date_r', 'year-month'])
joined['mine'] = joined['mine'].fillna(0).astype(int)
print(joined)

Output

   long  lat        date proximity  mine
0     5    8  23/06/2009      Near     1
1     6   10  05/10/2012       Far     0
2     8    6  19/02/2010      Near     0
3     3    4  30/04/2014      Near     0
4     5    8  01/06/2009       Far     1

首先從date列中提取monthyear並將其分配給臨時列mon-year ,然后使用DataFrame.merge左合並long, lat and mon-year上的數據幀df1df2 ,然后使用Series.fillna填充在mine列中的NaN值為0 ,最后使用DataFrame.drop刪除臨時列mon-year

df1['mon-year'] = df1['date'].str.extract(r'/(.*)')
df2['mon-year'] = df2['date'].str.extract(r'/(.*)')

# OR we can use pd.to_datetime,
# df1['mon-year'] = pd.to_datetime(df1['date'], format='%d/%m/%Y').dt.strftime('%m-%Y')
# df2['mon-year'] = pd.to_datetime(df2['date'], format='%d/%m/%Y').dt.strftime('%m-%Y')

df3 = df1.merge(
    df2.drop('date', 1),
    on=['long', 'lat', 'mon-year'], how='left').drop('mon-year', 1)

df3['mine'] = df3['mine'].fillna(0)

結果:

# print(df3)

   long  lat        date proximity  mine
0     5    8  23/06/2009      Near   1.0
1     6   10  05/10/2012       Far   0.0
2     8    6  19/02/2010      Near   0.0
3     3    4  30/04/2014      Near   0.0
4     5    8  01/06/2009       Far   1.0

您可以使用多個鍵進行合並,如下所示:

df_1.merge(df_2, how='left', left_on=['long', 'lat', 'date'], right_on=['long', 'lat', 'date'])

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM