简体   繁体   English

使用组合键合并 pandas 中的数据框

[英]Merge dataframes in pandas with a combination of keys

I have two dataframes that I need to combine together based on a key (an 'incident number').我有两个数据框,我需要根据一个键(一个“事件编号”)将它们组合在一起。 The key, however, is repeated, as the database they will be ingested by requires a particular format for coordinates.然而,关键是重复的,因为它们将被摄取的数据库需要特定的坐标格式。 How can join the necessary columns based on a combination of keys?如何根据键的组合加入必要的列?

For example, the two tables look like:例如,这两个表如下所示:

Incident_Number事故编号 Lat/Long纬度/经度 GPSCoordinates GPS坐标
AB123 AB123 Lat纬度 32.123 32.123
AB123 AB123 Long 120.123 120.123
CD321 CD321 Lat纬度 31.321 31.321
CD321 CD321 Long 121.321 121.321

and...和...

Incident_Number事故编号 Lat/Long纬度/经度 GeoCodeCoordinates地理代码坐标
AB123 AB123 Lat纬度 35.123 35.123
AB123 AB123 Long 125.123 125.123
CD321 CD321 Lat纬度 36.321 36.321
CD321 CD321 Long 126.321 126.321

And I need to get to...我需要去...

IncidentNumber事故编号 Lat/Long纬度/经度 GPSCoordinates GPS坐标 GeoCodeCoordinates地理代码坐标
AB123 AB123 Lat纬度 32.123 32.123 35.123 35.123
AB123 AB123 Long 120.123 120.123 125.123 125.123
CD321 CD321 Lat纬度 31.321 31.321 36.321 36.321
CD321 CD321 Long 121.321 121.321 126.321 126.321

The number of records are not 100% equal in each table so it needs to allow for NaNs.每个表中的记录数不是 100% 相等,因此需要允许 NaN。 I am essentially trying to add the column 'GeoCodeCoordinates' to the other dataframe on a combination of 'Incident Number' and 'Lat/Long', so it will treat the value 'AB123 + Lat' and 'AB123 + Long' as a single key.我实际上是在尝试将“GeoCodeCoordinates”列添加到另一个 dataframe 中,结合“事件编号”和“纬度/经度”,因此它将值“AB123 + Lat”和“AB123 + Long”视为一个单一的钥匙。 Can this be specified within code, or does a new column and a calculation to create that value as a key need to be created?这可以在代码中指定,还是需要创建一个新列和一个计算来创建该值作为键?

I imagine I went about this in a bit of a goofy way.我想我是以一种有点愚蠢的方式来做这件事的。 The Lat and Long were originally stored in separate fields and I used.melt() to make the data longer. Lat 和 Long 最初存储在单独的字段中,我使用 .melt() 使数据更长。 The database that will ultimately take this in requires the longer format for the Lat/Long field.最终接受这个的数据库需要 Lat/Long 字段的更长格式。

GPSColList = list(GPSRecords.columns)

GPSColList.remove('Latitude')

GPSList.remove('Longitude')

GPSMelt = GPSRecords.melt(id_vars=GPSColList, value_vars=['Latitude', 'Longitude'], var_name='Lat/Long', value_name="GPSCoordinates")

As the two sets of coordinates were in separate fields I created two dataframes with each set of coordinates and melted them separately.由于两组坐标位于不同的字段中,我用每组坐标创建了两个数据框并分别熔化它们。 My attempt to merge them looks like:我尝试合并它们看起来像:

mergeMelt = pd.merge(GPSMelt, GeoCodeMelt[["GeoCodeCoordinates"]], on=['Incident_Number', 'Lat/Long'])

Result is KeyError: 'Incident_Number'结果是 KeyError: 'Incident_Number'

Try:尝试:

cols = ['Incident_Number', 'Lat/Long', 'GeoCodeCoordinates']
mergeMelt = GPSMelt.merge(GeoCodeMelt[cols], on=cols[:-1])

The KeyError: 'Incident_Number' is raised because you use GeoCodeMelt[['GeoCodeCoordinates']] so your columns Incident_Number and Lat/Long don't exist when you merge. KeyError: 'Incident_Number'被引发是因为您使用GeoCodeMelt[['GeoCodeCoordinates']]所以当您合并时您的列Incident_NumberLat/Long不存在。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM