简体   繁体   中英

How to merge two datasets based on conditions

I'm attempting to merge two datasets in python based on 3 conditions. They have to have the same Longtitude,Latitude and month of a specific year. One dataset has the size of about 16k and the other 1.7k. A simple example of the inputs and expected output is as follows:

>df1
 long  lat  date        proximity
 5      8   23/06/2009    Near
 6      10  05/10/2012    Far
 8      6   19/02/2010    Near
 3      4   30/04/2014    Near
 5      8   01/06/2009    Far

 >df2
 long  lat  date          mine
 5      8   10/06/2009     1
 8      6   24/02/2010     0
 7      2   19/04/2014     1 
 3      4   30/04/2013     1

If any condition is false the value in "mine" when merged is 0. How would I merge to get:

 long  lat  date        proximity  mine
 5      8   23/06/2009    Near      1
 6      10  05/10/2012    Far       0
 8      6   19/02/2010    Near      0
 3      4   30/04/2014    Near      0
 5      8   01/06/2009    Far       1

The date column is not necessary in the output if that makes it easier.

Here you go:

df1['year-month'] = pd.to_datetime(df1['date'], format='%d/%m/%Y').dt.strftime('%Y/%m')
df2['year-month'] = pd.to_datetime(df2['date'], format='%d/%m/%Y').dt.strftime('%Y/%m')

joined = df1.merge(df2,
          how='left',
          on =['long', 'lat', 'year-month'],
          suffixes=['', '_r']).drop(columns = ['date_r', 'year-month'])
joined['mine'] = joined['mine'].fillna(0).astype(int)
print(joined)

Output

   long  lat        date proximity  mine
0     5    8  23/06/2009      Near     1
1     6   10  05/10/2012       Far     0
2     8    6  19/02/2010      Near     0
3     3    4  30/04/2014      Near     0
4     5    8  01/06/2009       Far     1

First extract the month and year from the date column and assign it to temporary column mon-year , then use DataFrame.merge to left merge the dataframes df1 , df2 on long, lat and mon-year , then use Series.fillna to fill the NaN values in the mine column with 0 , finally use DataFrame.drop to drop the temporary column mon-year :

df1['mon-year'] = df1['date'].str.extract(r'/(.*)')
df2['mon-year'] = df2['date'].str.extract(r'/(.*)')

# OR we can use pd.to_datetime,
# df1['mon-year'] = pd.to_datetime(df1['date'], format='%d/%m/%Y').dt.strftime('%m-%Y')
# df2['mon-year'] = pd.to_datetime(df2['date'], format='%d/%m/%Y').dt.strftime('%m-%Y')

df3 = df1.merge(
    df2.drop('date', 1),
    on=['long', 'lat', 'mon-year'], how='left').drop('mon-year', 1)

df3['mine'] = df3['mine'].fillna(0)

Result:

# print(df3)

   long  lat        date proximity  mine
0     5    8  23/06/2009      Near   1.0
1     6   10  05/10/2012       Far   0.0
2     8    6  19/02/2010      Near   0.0
3     3    4  30/04/2014      Near   0.0
4     5    8  01/06/2009       Far   1.0

You could merge using mutiple keys as follows:

df_1.merge(df_2, how='left', left_on=['long', 'lat', 'date'], right_on=['long', 'lat', 'date'])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM