I'm attempting to merge two datasets in python based on 3 conditions. They have to have the same Longtitude,Latitude and month of a specific year. One dataset has the size of about 16k and the other 1.7k. A simple example of the inputs and expected output is as follows:
>df1
long lat date proximity
5 8 23/06/2009 Near
6 10 05/10/2012 Far
8 6 19/02/2010 Near
3 4 30/04/2014 Near
5 8 01/06/2009 Far
>df2
long lat date mine
5 8 10/06/2009 1
8 6 24/02/2010 0
7 2 19/04/2014 1
3 4 30/04/2013 1
If any condition is false the value in "mine" when merged is 0. How would I merge to get:
long lat date proximity mine
5 8 23/06/2009 Near 1
6 10 05/10/2012 Far 0
8 6 19/02/2010 Near 0
3 4 30/04/2014 Near 0
5 8 01/06/2009 Far 1
The date column is not necessary in the output if that makes it easier.
Here you go:
df1['year-month'] = pd.to_datetime(df1['date'], format='%d/%m/%Y').dt.strftime('%Y/%m')
df2['year-month'] = pd.to_datetime(df2['date'], format='%d/%m/%Y').dt.strftime('%Y/%m')
joined = df1.merge(df2,
how='left',
on =['long', 'lat', 'year-month'],
suffixes=['', '_r']).drop(columns = ['date_r', 'year-month'])
joined['mine'] = joined['mine'].fillna(0).astype(int)
print(joined)
Output
long lat date proximity mine
0 5 8 23/06/2009 Near 1
1 6 10 05/10/2012 Far 0
2 8 6 19/02/2010 Near 0
3 3 4 30/04/2014 Near 0
4 5 8 01/06/2009 Far 1
First extract the month
and year
from the date
column and assign it to temporary column mon-year
, then use DataFrame.merge
to left merge the dataframes df1
, df2
on long, lat and mon-year
, then use Series.fillna
to fill the NaN
values in the mine
column with 0
, finally use DataFrame.drop
to drop the temporary column mon-year
:
df1['mon-year'] = df1['date'].str.extract(r'/(.*)')
df2['mon-year'] = df2['date'].str.extract(r'/(.*)')
# OR we can use pd.to_datetime,
# df1['mon-year'] = pd.to_datetime(df1['date'], format='%d/%m/%Y').dt.strftime('%m-%Y')
# df2['mon-year'] = pd.to_datetime(df2['date'], format='%d/%m/%Y').dt.strftime('%m-%Y')
df3 = df1.merge(
df2.drop('date', 1),
on=['long', 'lat', 'mon-year'], how='left').drop('mon-year', 1)
df3['mine'] = df3['mine'].fillna(0)
Result:
# print(df3)
long lat date proximity mine
0 5 8 23/06/2009 Near 1.0
1 6 10 05/10/2012 Far 0.0
2 8 6 19/02/2010 Near 0.0
3 3 4 30/04/2014 Near 0.0
4 5 8 01/06/2009 Far 1.0
You could merge using mutiple keys as follows:
df_1.merge(df_2, how='left', left_on=['long', 'lat', 'date'], right_on=['long', 'lat', 'date'])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.