[英]Identify US county from from latitude and longitude using Python
I am using the codes below to identify US county.我正在使用下面的代码来识别美国县。 The data is taken from Yelp which provides lat/lon coordinate.
数据取自 Yelp,它提供纬度/经度坐标。
id ![]() |
latitude![]() |
longitude![]() |
---|---|---|
1 ![]() |
40.017544 ![]() |
-105.283348 ![]() |
2 ![]() |
45.588906 ![]() |
-122.593331 ![]() |
import pandas
df = pandas.read_json("/Users/yelp/yelp_academic_dataset_business.json", lines=True, encoding='utf-8')
# Identify county
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="http")
df['county'] = geolocator.reverse(df['latitude'],df['longitude'])
The error was "TypeError: reverse() takes 2 positional arguments but 3 were given".错误是“TypeError:reverse() 需要 2 个位置参数,但给出了 3 个”。
Nominatim.reverse
takes coordinate pairs; Nominatim.reverse
采用坐标对; the issue is that you are passing it pandas dataframe columns.问题是您正在传递熊猫数据框列。
df['latitude']
here refers to the entire column in your data, not just one value, and since geopy
is independent of pandas
, it doesn't support processing an entire column and instead just sees that the input isn't a valid number. df['latitude']
在这里指的是数据中的整列,而不仅仅是一个值,并且由于geopy
独立于pandas
,它不支持处理整列,而只是看到输入无效数字。
Instead, try looping through the rows:相反,尝试遍历行:
county = []
for row in range(len(df)):
county.append(geolocator.reverse((df['latitude'][row], df['longitude'][row])))
(Note the double brackets.) (注意双括号。)
Then, insert the column into the dataframe:然后,将列插入到数据框中:
df.insert(index, 'county', county, True)
( index
should be what column position you want, and the boolean value at the end indicates that duplicate values are allowed.) (
index
应该是你想要的列位置,最后的布尔值表示允许重复值。)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.