[英]Need to merge two pandas dataframe using two columns latitude and longitude
this is my dataframe#1:city names with its latitude and longitude这是我的数据框#1:城市名称及其纬度和经度
df1 = {"city":['delhi','new york','london','paris','chennai'],"lat":[12.23,22.444,23.233,45.32,34.22],"long":[11.22,22.332,34.23,55.23,24.22]
this is dataframe#2: country names with latitude and longitude这是数据框#2:带有纬度和经度的国家名称
df2 = pd.DataFrame({"country":['India','US','UK','France','India'],"lat":[12.13,22.54,22.33,45.32,34.22],"long":[11.12,22.132,34.23,54.23,24.22]})
I need to match these two columns lat and long to merge these two tables.我需要匹配这两列 lat 和 long 来合并这两个表。 the problem is the lat and long is not exactly matching and the values are + or - 0.1 or 0.2.问题是 lat 和 long 不完全匹配,值为 + 或 - 0.1 或 0.2。 (if matched I can use the pd.merge option) lat and longs are not real here. (如果匹配,我可以使用 pd.merge 选项) lat 和 long 在这里不是真实的。 just an example只是一个例子
Expected Result:预期结果:
result = pd.DataFrame({"city":['delhi','new york','london','paris','chennai'],"country":['India','US','UK','France','India'],"lat":[12.13,22.54,22.33,45.32,34.22],"long":[11.12,22.132,34.23,54.23,24.22]})
what is the best approach to merge these tables?合并这些表的最佳方法是什么?
For example of a cross merge:例如交叉合并:
(df1.assign(dummy=1)
.merge(df2.assign(dummy=1),on='dummy')
.query('abs(lat_x-lat_y)<=0.1 and abs(long_x-long_y)<=0.2')
.drop('dummy', axis=1)
)
Output: Output:
city lat_x long_x country lat_y long_y
0 delhi 12.230 11.220 India 12.13 11.120
6 new york 22.444 22.332 US 22.54 22.132
24 chennai 34.220 24.220 India 34.22 24.220
Geopandas may be use here. Geopandas可以在这里使用。
Provided that you have boundaries of countries as polygons, you can use spacial joins .如果您将国家边界作为多边形,则可以使用空间连接。
In your question, you are reducing countries to single points which may not be the best representation.在您的问题中,您将国家减少到可能不是最佳代表的单点。
Example from the documentation:文档中的示例:
In a Spatial Join, two geometry objects are merged based on their spatial relationship to one another.在空间连接中,两个几何对象基于它们彼此的空间关系进行合并。
# One GeoDataFrame of countries, one of Cities.
# Want to merge so we can get each city's country.
In [11]: countries.head()
Out[11]:
geometry country
0 MULTIPOLYGON (((180.000000000 -16.067132664, 1... Fiji
1 POLYGON ((33.903711197 -0.950000000, 34.072620... Tanzania
2 POLYGON ((-8.665589565 27.656425890, -8.665124... W. Sahara
3 MULTIPOLYGON (((-122.840000000 49.000000000, -... Canada
4 MULTIPOLYGON (((-122.840000000 49.000000000, -... United States of America
In [12]: cities.head()
Out[12]:
name geometry
0 Vatican City POINT (12.453386545 41.903282180)
1 San Marino POINT (12.441770158 43.936095835)
2 Vaduz POINT (9.516669473 47.133723774)
3 Luxembourg POINT (6.130002806 49.611660379)
4 Palikir POINT (158.149974324 6.916643696)
# Execute spatial join
In [13]: cities_with_country = geopandas.sjoin(cities, countries, how="inner", op='intersects')
In [14]: cities_with_country.head()
Out[14]:
name geometry index_right country
0 Vatican City POINT (12.453386545 41.903282180) 141 Italy
1 San Marino POINT (12.441770158 43.936095835) 141 Italy
192 Rome POINT (12.481312563 41.897901485) 141 Italy
2 Vaduz POINT (9.516669473 47.133723774) 114 Austria
184 Vienna POINT (16.364693097 48.201961137) 114 Austria
If you don't have the polygons representing the countries, you need to extend the point representing each country to an area.如果没有代表国家的多边形,则需要将代表每个国家的点扩展到一个区域。 You can do this using the buffer method in Shapely that is extending a point to an area given a distance:您可以使用Shapely中的buffer方法执行此操作,该方法将点扩展到给定距离的区域:
Point(0, 0).buffer(10.0),
assuming a point at coordinates [0,0]
and a distance of 10.0
.假设坐标[0,0]
处的点和距离为10.0
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.