简体   繁体   中英

Assigning City Name by Latitude/Longitude values in Pandas Dataframe

I have this data frame:

    userId      latitude    longitude        dateTime
0   121165      30.314368   76.384381   2018-02-01 00:01:57
1   95592       13.186810   77.643769   2018-02-01 00:02:17
2   111435      28.512889   77.088154   2018-02-01 00:04:02
3   129532      9.828420    76.310357   2018-02-01 00:06:03
4   95592       13.121986   77.610539   2018-02-01 00:08:54

I want to create a new dataframe column like:

     userId  latitude   longitude    dateTime              city
0   121165  30.314368   76.384381   2018-02-01   00:01:57  Bengaluru
1   95592   13.186810   77.643769   2018-02-01   00:02:17  Delhi
2   111435  28.512889   77.088154   2018-02-01   00:04:02  Mumbai
3   129532  9.828420    76.310357   2018-02-01   00:06:03  Chennai
4   95592   13.121986   77.610539   2018-02-01   00:08:54  Delhi

I saw this code here , but its not working out.

This is the code given there:

from urllib2 import urlopen
import json
def getplace(lat, lon):
    url = "http://maps.googleapis.com/maps/api/geocode/json?"
    url += "latlng=%s,%s&sensor=false" % (lat, lon)
    v = urlopen(url).read()
    j = json.loads(v)
    components = j['results'][0]['address_components']
    country = town = None
    for c in components:
        if "country" in c['types']:
            country = c['long_name']
        if "postal_town" in c['types']:
            town = c['long_name']
    return town, country
for i,j in df['latitude'], df['longitude']:
    getplace(i, j)

I get error at this place:

components = j['results'][0]['address_components']

list index out of range

I put some other latitude longitude values of UK and it worked out, but not for Indian states.

So now I want to try out something like this:

if i,j in zip(range(79,80),range(83,84)):
    df['City']='Bengaluru'
elif i,j in zip(range(13,14),range(70,71)):
    df['City']='Delhi'

and so on. So how can I assign city in a more feasible manner using latitude and longitude values?

The code snippet that you are using was from 2013; the Google API has changed and 'postal_town' is no longer available.

You can use the following code which takes advantage of the requests library and places a guard in the case of no results being returned.

In [48]: def location(lat, long):
    ...:     url = 'http://maps.googleapis.com/maps/api/geocode/json?latlng={0},{1}&sensor=false'.format(lat, long)
    ...:     r = requests.get(url)
    ...:     r_json = r.json()
    ...:     if len(r_json['results']) < 1: return None, None
    ...:     res = r_json['results'][0]['address_components']
    ...:     country  = next((c['long_name'] for c in res if 'country' in c['types']), None)
    ...:     locality = next((c['long_name'] for c in res if 'locality' in c['types']), None)
    ...:     return locality, country
    ...:

In [49]: location(28.512889, 77.088154)
Out[49]: ('Gurugram', 'India')

This function searches 'locality' and actually doesn't return anything for the 2nd row of the DataFrame . You can choose what fields you want by inspecting the results (this is with a lat , long value of 30.314368, 76.384381 )

[{'long_name': 'Udyog Vihar',
  'short_name': 'Udyog Vihar',
  'types': ['political', 'sublocality', 'sublocality_level_2']},
 {'long_name': 'Kapas Hera Estate',
  'short_name': 'Kapas Hera Estate',
  'types': ['political', 'sublocality', 'sublocality_level_1']},
 {'long_name': 'Gurugram',
  'short_name': 'Gurugram',
  'types': ['locality', 'political']},
 {'long_name': 'Gurgaon',
  'short_name': 'Gurgaon',
  'types': ['administrative_area_level_2', 'political']},
 {'long_name': 'Haryana',
  'short_name': 'HR',
  'types': ['administrative_area_level_1', 'political']},
 {'long_name': 'India', 'short_name': 'IN', 'types': ['country', 'political']},
 {'long_name': '122016', 'short_name': '122016', 'types': ['postal_code']}]

To apply this to your DataFrame , you can use numpy 's vectorize like so (remember that the second row won't return anything)

In [71]: import numpy as np

In [72]: df['locality'] = np.vectorize(location)(df['latitude'], df['longitude'])

In [73]: df
Out[73]:
   userId   latitude  longitude             dateTime   locality
0  121165  30.314368  76.384381  2018-02-01 00:01:57    Patiala
1   95592  13.186810  77.643769  2018-02-01 00:02:17       None
2  111435  28.512889  77.088154  2018-02-01 00:04:02   Gurugram
3  129532   9.828420  76.310357  2018-02-01 00:06:03  Ezhupunna
4   95592  13.121986  77.610539  2018-02-01 00:08:54  Bengaluru

PS I noted that the city locations of the desired output aren't correct.

PPS You should also note that this may take some time as the function needs to query the API every time

You can also create the location function with broader ranges but it will be very crude and you might cover too wide an area. You can then use the function in the same way as previously shown

In [21]: def location(lat, long):
    ...:     if 9 <= lat < 10 and 76 <= long < 77:
    ...:         return 'Chennai'
    ...:     elif 13 <= lat < 14 and 77 <= long < 78:
    ...:         return 'Dehli'
    ...:     elif 28 <= lat < 29 and 77 <= long < 78:
    ...:         return 'Mumbai'
    ...:     elif 30 <= lat < 31 and 76 <= long < 77:
    ...:         return 'Bengaluru'
    ...:     

In [22]: df['city'] = np.vectorize(location)(df['latitude'], df['longitude'])

In [23]: df
Out[23]: 
   userId   latitude  longitude             dateTime       city
0  121165  30.314368  76.384381  2018-02-01 00:01:57  Bengaluru
1   95592  13.186810  77.643769  2018-02-01 00:02:17      Dehli
2  111435  28.512889  77.088154  2018-02-01 00:04:02     Mumbai
3  129532   9.828420  76.310357  2018-02-01 00:06:03    Chennai
4   95592  13.121986  77.610539  2018-02-01 00:08:54      Dehli

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM