[英]Merge dataframes in pandas with a combination of keys
I have two dataframes that I need to combine together based on a key (an 'incident number').我有两个数据框,我需要根据一个键(一个“事件编号”)将它们组合在一起。 The key, however, is repeated, as the database they will be ingested by requires a particular format for coordinates.然而,关键是重复的,因为它们将被摄取的数据库需要特定的坐标格式。 How can join the necessary columns based on a combination of keys?如何根据键的组合加入必要的列?
For example, the two tables look like:例如,这两个表如下所示:
Incident_Number事故编号 | Lat/Long纬度/经度 | GPSCoordinates GPS坐标 |
---|---|---|
AB123 AB123 | Lat纬度 | 32.123 32.123 |
AB123 AB123 | Long长 | 120.123 120.123 |
CD321 CD321 | Lat纬度 | 31.321 31.321 |
CD321 CD321 | Long长 | 121.321 121.321 |
and...和...
Incident_Number事故编号 | Lat/Long纬度/经度 | GeoCodeCoordinates地理代码坐标 |
---|---|---|
AB123 AB123 | Lat纬度 | 35.123 35.123 |
AB123 AB123 | Long长 | 125.123 125.123 |
CD321 CD321 | Lat纬度 | 36.321 36.321 |
CD321 CD321 | Long长 | 126.321 126.321 |
And I need to get to...我需要去...
IncidentNumber事故编号 | Lat/Long纬度/经度 | GPSCoordinates GPS坐标 | GeoCodeCoordinates地理代码坐标 |
---|---|---|---|
AB123 AB123 | Lat纬度 | 32.123 32.123 | 35.123 35.123 |
AB123 AB123 | Long长 | 120.123 120.123 | 125.123 125.123 |
CD321 CD321 | Lat纬度 | 31.321 31.321 | 36.321 36.321 |
CD321 CD321 | Long长 | 121.321 121.321 | 126.321 126.321 |
The number of records are not 100% equal in each table so it needs to allow for NaNs.每个表中的记录数不是 100% 相等,因此需要允许 NaN。 I am essentially trying to add the column 'GeoCodeCoordinates' to the other dataframe on a combination of 'Incident Number' and 'Lat/Long', so it will treat the value 'AB123 + Lat' and 'AB123 + Long' as a single key.我实际上是在尝试将“GeoCodeCoordinates”列添加到另一个 dataframe 中,结合“事件编号”和“纬度/经度”,因此它将值“AB123 + Lat”和“AB123 + Long”视为一个单一的钥匙。 Can this be specified within code, or does a new column and a calculation to create that value as a key need to be created?这可以在代码中指定,还是需要创建一个新列和一个计算来创建该值作为键?
I imagine I went about this in a bit of a goofy way.我想我是以一种有点愚蠢的方式来做这件事的。 The Lat and Long were originally stored in separate fields and I used.melt() to make the data longer. Lat 和 Long 最初存储在单独的字段中,我使用 .melt() 使数据更长。 The database that will ultimately take this in requires the longer format for the Lat/Long field.最终接受这个的数据库需要 Lat/Long 字段的更长格式。
GPSColList = list(GPSRecords.columns)
GPSColList.remove('Latitude')
GPSList.remove('Longitude')
GPSMelt = GPSRecords.melt(id_vars=GPSColList, value_vars=['Latitude', 'Longitude'], var_name='Lat/Long', value_name="GPSCoordinates")
As the two sets of coordinates were in separate fields I created two dataframes with each set of coordinates and melted them separately.由于两组坐标位于不同的字段中,我用每组坐标创建了两个数据框并分别熔化它们。 My attempt to merge them looks like:我尝试合并它们看起来像:
mergeMelt = pd.merge(GPSMelt, GeoCodeMelt[["GeoCodeCoordinates"]], on=['Incident_Number', 'Lat/Long'])
Result is KeyError: 'Incident_Number'结果是 KeyError: 'Incident_Number'
Try:尝试:
cols = ['Incident_Number', 'Lat/Long', 'GeoCodeCoordinates']
mergeMelt = GPSMelt.merge(GeoCodeMelt[cols], on=cols[:-1])
The KeyError: 'Incident_Number'
is raised because you use GeoCodeMelt[['GeoCodeCoordinates']]
so your columns Incident_Number
and Lat/Long
don't exist when you merge. KeyError: 'Incident_Number'
被引发是因为您使用GeoCodeMelt[['GeoCodeCoordinates']]
所以当您合并时您的列Incident_Number
和Lat/Long
不存在。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.