I have two Pandas DataFrame objects that need to be joined by multiple index and columns.
DF1
with daily data (Indices are RNK, R_ID, latitude and longitude):
Date FFDI
RNK R_ID latitude longitude
1 0 -39.20000 140.80000 1973-04-02 5.40000
1 -39.20000 140.83786 1973-04-02 5.40000
2 -39.20000 140.87572 1973-04-02 5.40000
3 -39.20000 140.91359 1973-04-02 5.40000
4 -39.20000 140.95145 1973-04-02 5.40000
5 -39.20000 140.98930 1973-04-02 5.40000
6 -39.20000 141.02716 1973-04-02 5.40000
7 -39.20000 141.06502 1973-05-31 5.40000
8 -39.20000 141.10289 1973-05-31 5.50000
9 -39.20000 141.14075 1973-05-31 6.00000
10 -39.20000 141.17860 1973-05-31 6.40000
11 -39.20000 141.21646 1973-05-31 6.80000
12 -39.20000 141.25432 1973-05-31 7.70000
13 -39.20000 141.29219 1973-05-31 7.90000
14 -39.20000 141.33005 1973-05-31 7.00000
15 -39.20000 141.36790 1973-05-31 6.60000
16 -39.20000 141.40576 1973-05-31 6.10000
17 -39.20000 141.44362 1973-05-31 5.00000
18 -39.20000 141.48149 1973-05-31 4.40000
19 -39.20000 141.51935 1972-04-21 4.40000
20 -39.20000 141.55721 1972-04-21 4.40000
21 -39.20000 141.59506 1972-04-21 4.50000
22 -39.20000 141.63292 1972-04-21 4.60000
23 -39.20000 141.67079 1972-04-21 4.70000
24 -39.20000 141.70865 1972-04-21 4.70000
25 -39.20000 141.74651 1972-04-21 4.70000
26 -39.20000 141.78436 1972-04-21 4.70000
27 -39.20000 141.82222 1972-04-21 4.70000
28 -39.20000 141.86009 1972-04-21 4.70000
29 -39.20000 141.89795 1972-04-21 4.70000
... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ...
5 36082 -33.90000 148.90205 1972-12-24 35.70000
36083 -33.90000 148.93991 1974-11-12 36.30000
36084 -33.90000 148.97778 1974-11-12 35.90000
36085 -33.90000 149.01564 1973-11-20 36.80000
36086 -33.90000 149.05350 1973-11-20 37.00000
36087 -33.90000 149.09135 1974-11-12 35.60000
36088 -33.90000 149.12921 1973-01-03 35.90000
36089 -33.90000 149.16708 1973-01-03 34.40000
36090 -33.90000 149.20494 1973-01-03 32.90000
36091 -33.90000 149.24280 1973-01-03 32.20000
36092 -33.90000 149.28065 1973-01-03 32.30000
36093 -33.90000 149.31851 1973-01-03 32.20000
36094 -33.90000 149.35638 1973-01-03 30.20000
36095 -33.90000 149.39424 1973-11-20 28.60000
36096 -33.90000 149.43210 1973-11-20 28.70000
36097 -33.90000 149.46996 1973-11-20 29.10000
36098 -33.90000 149.50781 1973-11-20 30.10000
36099 -33.90000 149.54568 1973-11-20 30.80000
36100 -33.90000 149.58354 1973-01-09 30.60000
36101 -33.90000 149.62140 1973-01-09 30.10000
36102 -33.90000 149.65926 1973-01-09 29.50000
36103 -33.90000 149.69711 1973-01-09 29.20000
36104 -33.90000 149.73499 1973-01-09 29.90000
36105 -33.90000 149.77284 1973-01-09 29.90000
36106 -33.90000 149.81070 1973-01-09 27.60000
36107 -33.90000 149.84856 1973-01-09 24.40000
36108 -33.90000 149.88641 1973-01-09 23.80000
36109 -33.90000 149.92429 1973-01-09 23.80000
36110 -33.90000 149.96214 1973-01-09 24.10000
36111 -33.90000 150.00000 1973-01-09 25.30000
DF2
with hourly data (Index = R_ID):
latitude longitude time T_SFC
R_ID
0 -39.20000 140.80000 1972-01-20 00:00:00 15.80000
0 -39.20000 140.80000 1972-01-20 01:00:00 15.90000
0 -39.20000 140.80000 1972-01-20 02:00:00 16.00000
0 -39.20000 140.80000 1972-01-20 03:00:00 16.20000
0 -39.20000 140.80000 1972-01-20 04:00:00 16.60000
0 -39.20000 140.80000 1972-01-20 05:00:00 16.60000
0 -39.20000 140.80000 1972-01-20 06:00:00 16.50000
0 -39.20000 140.80000 1972-01-20 07:00:00 16.50000
0 -39.20000 140.80000 1972-01-20 08:00:00 16.50000
0 -39.20000 140.80000 1972-01-20 09:00:00 16.40000
0 -39.20000 140.80000 1972-01-20 10:00:00 16.40000
0 -39.20000 140.80000 1972-01-20 11:00:00 16.40000
0 -39.20000 140.80000 1972-01-20 12:00:00 16.50000
0 -39.20000 140.80000 1972-01-20 13:00:00 16.60000
0 -39.20000 140.80000 1972-01-20 14:00:00 16.60000
0 -39.20000 140.80000 1972-01-20 15:00:00 16.70000
0 -39.20000 140.80000 1972-01-20 16:00:00 16.70000
0 -39.20000 140.80000 1972-01-20 17:00:00 16.60000
0 -39.20000 140.80000 1972-01-20 18:00:00 16.60000
0 -39.20000 140.80000 1972-01-20 19:00:00 16.60000
0 -39.20000 140.80000 1972-01-20 20:00:00 16.50000
0 -39.20000 140.80000 1972-01-20 21:00:00 16.50000
0 -39.20000 140.80000 1972-01-20 22:00:00 16.50000
0 -39.20000 140.80000 1972-01-20 23:00:00 16.40000
0 -39.20000 140.80000 1972-01-21 00:00:00 16.40000
0 -39.20000 140.80000 1972-01-21 01:00:00 16.30000
0 -39.20000 140.80000 1972-01-21 02:00:00 16.30000
0 -39.20000 140.80000 1972-01-21 03:00:00 16.30000
0 -39.20000 140.80000 1972-01-21 04:00:00 16.10000
0 -39.20000 140.80000 1972-01-21 05:00:00 16.00000
... ... ... ... ...
36111 -38.87551 141.14075 1974-12-30 18:00:00 14.10000
36111 -38.87551 141.14075 1974-12-30 19:00:00 14.10000
36111 -38.87551 141.14075 1974-12-30 20:00:00 14.10000
36111 -38.87551 141.14075 1974-12-30 21:00:00 14.10000
36111 -38.87551 141.14075 1974-12-30 22:00:00 14.20000
36111 -38.87551 141.14075 1974-12-30 23:00:00 14.30000
36111 -38.87551 141.14075 1974-12-31 00:00:00 14.40000
36111 -38.87551 141.14075 1974-12-31 01:00:00 14.50000
36111 -38.87551 141.14075 1974-12-31 02:00:00 14.50000
36111 -38.87551 141.14075 1974-12-31 03:00:00 14.50000
36111 -38.87551 141.14075 1974-12-31 04:00:00 14.50000
36111 -38.87551 141.14075 1974-12-31 05:00:00 14.50000
36111 -38.87551 141.14075 1974-12-31 06:00:00 14.60000
36111 -38.87551 141.14075 1974-12-31 07:00:00 14.50000
36111 -38.87551 141.14075 1974-12-31 08:00:00 14.30000
36111 -38.87551 141.14075 1974-12-31 09:00:00 14.40000
36111 -38.87551 141.14075 1974-12-31 10:00:00 14.30000
36111 -38.87551 141.14075 1974-12-31 11:00:00 14.30000
36111 -38.87551 141.14075 1974-12-31 12:00:00 14.40000
36111 -38.87551 141.14075 1974-12-31 13:00:00 14.50000
36111 -38.87551 141.14075 1974-12-31 14:00:00 14.40000
36111 -38.87551 141.14075 1974-12-31 15:00:00 14.30000
36111 -38.87551 141.14075 1974-12-31 16:00:00 14.30000
36111 -38.87551 141.14075 1974-12-31 17:00:00 14.30000
36111 -38.87551 141.14075 1974-12-31 18:00:00 14.30000
36111 -38.87551 141.14075 1974-12-31 19:00:00 14.40000
36111 -38.87551 141.14075 1974-12-31 20:00:00 14.50000
36111 -38.87551 141.14075 1974-12-31 21:00:00 14.60000
36111 -38.87551 141.14075 1974-12-31 22:00:00 14.70000
36111 -38.87551 141.14075 1974-12-31 23:00:00 14.80000
DF1
has a Date column with daily values from 1972-01-20 to 1974-12-31 while DF2
has a Time column with hourly values from 1972-01-20T00:00:00 to 1974-12-31T23:00:00. DF1
is sorted by RNK (Rank) and FFDI while DF2
is sorted by R_ID and time. One R_ID is a reference ID that corresponds to one unique pair of latitude and longitude. DF2
will be joined to DF1
with the same R_ID and the same Date that DF2
's time column belongs to. That is each row (day) in DF1
will have 24 (hours) rows from DF2
with the same value of day.
The output df will look like:
time T_SFC
RNK R_ID latitude longitude Date FFDI
1 0 -39.20000 140.80000 1973-04-02 5.40000 1973-04-02 00:00:00 13.8
1973-04-02 01:00:00 13.9
1973-04-02 02:00:00 13.0
1973-04-02 03:00:00 13.2
1973-04-02 04:00:00 13.6
... ... ... ...
1 -39.20000 140.83786 1973-04-02 5.40000 1973-04-02 00:00:00 13.8
1973-04-02 01:00:00 13.9
1973-04-02 02:00:00 13.0
1973-04-02 03:00:00 13.2
1973-04-02 04:00:00 13.6
... ... ... ...
2 -39.20000 140.87572 1973-04-02 5.40000 1973-04-02 00:00:00 13.8
1973-04-02 01:00:00 13.9
1973-04-02 02:00:00 13.0
1973-04-02 03:00:00 13.2
1973-04-02 04:00:00 13.6
... ... ... ...
... ... ... ...
2 0 -39.20000 140.80000 1974-03-07 5.60000 1974-03-07 00:00:00 15.8
1974-03-07 01:00:00 15.9
1974-03-07 02:00:00 16.0
1974-03-07 03:00:00 16.2
1974-03-07 04:00:00 16.6
... ... ... ...
1 -39.20000 140.83786 1973-03-09 5.40000 1973-03-09 00:00:00 15.8
1973-03-09 01:00:00 15.9
1973-03-09 02:00:00 16.0
1973-03-09 03:00:00 15.2
1973-03-09 04:00:00 15.6
... ... ... ...
... ... ... ...
... ... ... ...
5 36082 -33.90000 148.90205 1972-12-24 35.70000 1972-12-24 00:00:00 19.8
1972-12-24 01:00:00 19.1
1972-12-24 02:00:00 22.0
1972-12-24 03:00:00 24.2
1972-12-24 04:00:00 21.6
... ... ... ...
... ... ... ...
36111 -33.90000 150.00000 1973-01-09 25.30000 1973-01-09 00:00:00 19.8
1973-01-09 01:00:00 19.1
1973-01-09 02:00:00 22.0
1973-01-09 03:00:00 24.2
1973-01-09 04:00:00 21.6
... ... ... ...
1973-01-09 23:00:00 19.1
4,333,440 rows x 2 columns
Following @politinsa answer, I tried
# Add a new column Date and save date part of the time column to it.
df2['Date'] = df2['time'].dt.date.astype('datetime64[ns]')
df_joined = pd.merge(df1, df2, on=['REF_ID', 'Date'], how='inner')
The issue with the output is the multiindex from df1 was not kept with RNK missing from the output df.
print(df_joined)
time FFDI latitude longitude T_SFC time_original
REF_ID
0 1973-04-02 5.40000 -39.20000 140.80000 16.40000 1973-04-02 00:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 16.00000 1973-04-02 01:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 15.70000 1973-04-02 02:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 15.40000 1973-04-02 03:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 15.20000 1973-04-02 04:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 15.10000 1973-04-02 05:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 15.10000 1973-04-02 06:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 15.10000 1973-04-02 07:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 15.10000 1973-04-02 08:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 15.10000 1973-04-02 09:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 15.10000 1973-04-02 10:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 15.20000 1973-04-02 11:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 15.20000 1973-04-02 12:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 15.20000 1973-04-02 13:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 15.00000 1973-04-02 14:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 15.10000 1973-04-02 15:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 15.30000 1973-04-02 16:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 15.40000 1973-04-02 17:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 15.40000 1973-04-02 18:00:00
... ... ... ...
12000 rows × 6 columns
You could create a column in DF2 containing the date (instead of the date time ), ie at row 1973-04-02 01:00:00
you'd have a column Date
containing 1973-04-02
.
Then use a classical inner join ( pd.merge(df1, df2, on=['R_ID', 'Date'], how='inner')
) and it should do the trick.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.