I have this data frame:
userId latitude longitude dateTime
0 121165 30.314368 76.384381 2018-02-01 00:01:57
1 95592 13.186810 77.643769 2018-02-01 00:02:17
2 111435 28.512889 77.088154 2018-02-01 00:04:02
3 129532 9.828420 76.310357 2018-02-01 00:06:03
4 95592 13.121986 77.610539 2018-02-01 00:08:54
I want to create a new dataframe column like:
userId latitude longitude dateTime city
0 121165 30.314368 76.384381 2018-02-01 00:01:57 Bengaluru
1 95592 13.186810 77.643769 2018-02-01 00:02:17 Delhi
2 111435 28.512889 77.088154 2018-02-01 00:04:02 Mumbai
3 129532 9.828420 76.310357 2018-02-01 00:06:03 Chennai
4 95592 13.121986 77.610539 2018-02-01 00:08:54 Delhi
I saw this code here , but its not working out.
This is the code given there:
from urllib2 import urlopen
import json
def getplace(lat, lon):
url = "http://maps.googleapis.com/maps/api/geocode/json?"
url += "latlng=%s,%s&sensor=false" % (lat, lon)
v = urlopen(url).read()
j = json.loads(v)
components = j['results'][0]['address_components']
country = town = None
for c in components:
if "country" in c['types']:
country = c['long_name']
if "postal_town" in c['types']:
town = c['long_name']
return town, country
for i,j in df['latitude'], df['longitude']:
getplace(i, j)
I get error at this place:
components = j['results'][0]['address_components']
list index out of range
I put some other latitude longitude values of UK and it worked out, but not for Indian states.
So now I want to try out something like this:
if i,j in zip(range(79,80),range(83,84)):
df['City']='Bengaluru'
elif i,j in zip(range(13,14),range(70,71)):
df['City']='Delhi'
and so on. So how can I assign city in a more feasible manner using latitude and longitude values?
The code snippet that you are using was from 2013; the Google API has changed and 'postal_town'
is no longer available.
You can use the following code which takes advantage of the requests
library and places a guard in the case of no results being returned.
In [48]: def location(lat, long):
...: url = 'http://maps.googleapis.com/maps/api/geocode/json?latlng={0},{1}&sensor=false'.format(lat, long)
...: r = requests.get(url)
...: r_json = r.json()
...: if len(r_json['results']) < 1: return None, None
...: res = r_json['results'][0]['address_components']
...: country = next((c['long_name'] for c in res if 'country' in c['types']), None)
...: locality = next((c['long_name'] for c in res if 'locality' in c['types']), None)
...: return locality, country
...:
In [49]: location(28.512889, 77.088154)
Out[49]: ('Gurugram', 'India')
This function searches 'locality'
and actually doesn't return anything for the 2nd row of the DataFrame
. You can choose what fields you want by inspecting the results (this is with a lat
, long
value of 30.314368, 76.384381
)
[{'long_name': 'Udyog Vihar',
'short_name': 'Udyog Vihar',
'types': ['political', 'sublocality', 'sublocality_level_2']},
{'long_name': 'Kapas Hera Estate',
'short_name': 'Kapas Hera Estate',
'types': ['political', 'sublocality', 'sublocality_level_1']},
{'long_name': 'Gurugram',
'short_name': 'Gurugram',
'types': ['locality', 'political']},
{'long_name': 'Gurgaon',
'short_name': 'Gurgaon',
'types': ['administrative_area_level_2', 'political']},
{'long_name': 'Haryana',
'short_name': 'HR',
'types': ['administrative_area_level_1', 'political']},
{'long_name': 'India', 'short_name': 'IN', 'types': ['country', 'political']},
{'long_name': '122016', 'short_name': '122016', 'types': ['postal_code']}]
To apply this to your DataFrame
, you can use numpy
's vectorize
like so (remember that the second row won't return anything)
In [71]: import numpy as np
In [72]: df['locality'] = np.vectorize(location)(df['latitude'], df['longitude'])
In [73]: df
Out[73]:
userId latitude longitude dateTime locality
0 121165 30.314368 76.384381 2018-02-01 00:01:57 Patiala
1 95592 13.186810 77.643769 2018-02-01 00:02:17 None
2 111435 28.512889 77.088154 2018-02-01 00:04:02 Gurugram
3 129532 9.828420 76.310357 2018-02-01 00:06:03 Ezhupunna
4 95592 13.121986 77.610539 2018-02-01 00:08:54 Bengaluru
PS I noted that the city locations of the desired output aren't correct.
PPS You should also note that this may take some time as the function needs to query the API every time
You can also create the location function with broader ranges but it will be very crude and you might cover too wide an area. You can then use the function in the same way as previously shown
In [21]: def location(lat, long):
...: if 9 <= lat < 10 and 76 <= long < 77:
...: return 'Chennai'
...: elif 13 <= lat < 14 and 77 <= long < 78:
...: return 'Dehli'
...: elif 28 <= lat < 29 and 77 <= long < 78:
...: return 'Mumbai'
...: elif 30 <= lat < 31 and 76 <= long < 77:
...: return 'Bengaluru'
...:
In [22]: df['city'] = np.vectorize(location)(df['latitude'], df['longitude'])
In [23]: df
Out[23]:
userId latitude longitude dateTime city
0 121165 30.314368 76.384381 2018-02-01 00:01:57 Bengaluru
1 95592 13.186810 77.643769 2018-02-01 00:02:17 Dehli
2 111435 28.512889 77.088154 2018-02-01 00:04:02 Mumbai
3 129532 9.828420 76.310357 2018-02-01 00:06:03 Chennai
4 95592 13.121986 77.610539 2018-02-01 00:08:54 Dehli
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.