[英]Merging Data Frames on Unique Values
I have 2 data frames.我有 2 个数据框。 One is a generic "template" with a column of dates that go every hour from now until 4 days from now.一个是一个通用的“模板”,其中有一列日期,从现在到 4 天,每小时都会发生一次。 The other DF has data in it, such as Latitude and Longitude, it also has a date column but the data is only every 3 hours.另一个DF里面有数据,比如纬度和经度,它也有一个日期列,但数据只有每3小时一次。 I need to combine both data frames so that each lat/lon pair in df2 has an every hour from df1.我需要组合两个数据帧,以便 df2 中的每个纬度/经度对每小时都有一个来自 df1 的数据。
DF1 DF2
Date Shift Latitude Longitude Date Temp
2021-10-18 01:00:00 a1 39.9 -99.3 2021-10-18 18:00:00 34
2021-10-18 02:00:00 a2 39.9 -99.3 2021-10-18 21:00:00 36
..... .............
2021-10-18 21:00:00 b2 39.9 -99.3 2021-10-19 00:00:00 32
Expected Final Data Frame预期的最终数据帧
Latitude Longitude Date Shift Temp
39.9 -99.3 2021-10-18 01:00:00 a1 NaN
39.9 -99.3 2021-10-18 02:00:00 a1 NaN
.....
39.9 -99.3 2021-10-18 17:00:00 b2 NaN
39.9 -99.3 2021-10-18 18:00:00 b2 34
39.9 -99.3 2021-10-18 19:00:00 b2 NaN
In DF2 there are 3,088 unique pairs of Lat/Lon and each of the unqiue pairs has to have a date column of 4 days, counting hour by hour.在 DF2 中,有 3,088 对独特的纬度/经度对,每个独特的对都必须有一个 4 天的日期列,按小时计算。 My final DF should have 299,536 lines in it.我的最终 DF 中应该有 299,536 行。
Use merge with the how and on options.将合并与如何和开启选项一起使用。 From the pandas docs :来自熊猫文档:
df1 = pd.DataFrame({'a': ['foo', 'bar'], 'b': [1, 2]})
df2 = pd.DataFrame({'a': ['foo', 'baz'], 'c': [3, 4]})
df1.merge(df2, how='inner', on='a')
will give you:会给你:
a b c
0 foo 1 3
while using:使用时:
df1.merge(df2, how='left', on='a')
will give you:会给你:
a b c
0 foo 1 3.0
1 bar 2 NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.