在熊貓數據框中通過緯度/經度值分配城市名稱

Question

我有這個數據框：

    userId      latitude    longitude        dateTime
0   121165      30.314368   76.384381   2018-02-01 00:01:57
1   95592       13.186810   77.643769   2018-02-01 00:02:17
2   111435      28.512889   77.088154   2018-02-01 00:04:02
3   129532      9.828420    76.310357   2018-02-01 00:06:03
4   95592       13.121986   77.610539   2018-02-01 00:08:54

我想創建一個新的dataframe列，例如：

     userId  latitude   longitude    dateTime              city
0   121165  30.314368   76.384381   2018-02-01   00:01:57  Bengaluru
1   95592   13.186810   77.643769   2018-02-01   00:02:17  Delhi
2   111435  28.512889   77.088154   2018-02-01   00:04:02  Mumbai
3   129532  9.828420    76.310357   2018-02-01   00:06:03  Chennai
4   95592   13.121986   77.610539   2018-02-01   00:08:54  Delhi

我在這里看到了這段代碼，但是沒有奏效。

這是此處給出的代碼：

from urllib2 import urlopen
import json
def getplace(lat, lon):
    url = "http://maps.googleapis.com/maps/api/geocode/json?"
    url += "latlng=%s,%s&sensor=false" % (lat, lon)
    v = urlopen(url).read()
    j = json.loads(v)
    components = j['results'][0]['address_components']
    country = town = None
    for c in components:
        if "country" in c['types']:
            country = c['long_name']
        if "postal_town" in c['types']:
            town = c['long_name']
    return town, country
for i,j in df['latitude'], df['longitude']:
    getplace(i, j)

我在這個地方出錯：

components = j['results'][0]['address_components']

列表索引超出范圍

我輸入了英國的其他一些經度值，並且得出了結果，但不適用於印度各州。

所以現在我想嘗試這樣的事情：

if i,j in zip(range(79,80),range(83,84)):
    df['City']='Bengaluru'
elif i,j in zip(range(13,14),range(70,71)):
    df['City']='Delhi'

等等。 那么如何使用經度和緯度值以更可行的方式分配城市？

Answer 1

您使用的代碼段來自2013年； Google API已更改，並且'postal_town'不再可用。

您可以使用以下代碼，該代碼利用了requests庫並在沒有返回結果的情況下設置了保護措施。

In [48]: def location(lat, long):
    ...:     url = 'http://maps.googleapis.com/maps/api/geocode/json?latlng={0},{1}&sensor=false'.format(lat, long)
    ...:     r = requests.get(url)
    ...:     r_json = r.json()
    ...:     if len(r_json['results']) < 1: return None, None
    ...:     res = r_json['results'][0]['address_components']
    ...:     country  = next((c['long_name'] for c in res if 'country' in c['types']), None)
    ...:     locality = next((c['long_name'] for c in res if 'locality' in c['types']), None)
    ...:     return locality, country
    ...:

In [49]: location(28.512889, 77.088154)
Out[49]: ('Gurugram', 'India')

此函數搜索'locality' ，實際上對於DataFrame的第二行不返回任何內容。 您可以通過檢查結果來選擇想要的字段（這是lat ， long值為30.314368, 76.384381 ）

[{'long_name': 'Udyog Vihar',
  'short_name': 'Udyog Vihar',
  'types': ['political', 'sublocality', 'sublocality_level_2']},
 {'long_name': 'Kapas Hera Estate',
  'short_name': 'Kapas Hera Estate',
  'types': ['political', 'sublocality', 'sublocality_level_1']},
 {'long_name': 'Gurugram',
  'short_name': 'Gurugram',
  'types': ['locality', 'political']},
 {'long_name': 'Gurgaon',
  'short_name': 'Gurgaon',
  'types': ['administrative_area_level_2', 'political']},
 {'long_name': 'Haryana',
  'short_name': 'HR',
  'types': ['administrative_area_level_1', 'political']},
 {'long_name': 'India', 'short_name': 'IN', 'types': ['country', 'political']},
 {'long_name': '122016', 'short_name': '122016', 'types': ['postal_code']}]

要將其應用於DataFrame ，您可以像這樣使用numpy的vectorize （請記住，第二行將不返回任何內容）

In [71]: import numpy as np

In [72]: df['locality'] = np.vectorize(location)(df['latitude'], df['longitude'])

In [73]: df
Out[73]:
   userId   latitude  longitude             dateTime   locality
0  121165  30.314368  76.384381  2018-02-01 00:01:57    Patiala
1   95592  13.186810  77.643769  2018-02-01 00:02:17       None
2  111435  28.512889  77.088154  2018-02-01 00:04:02   Gurugram
3  129532   9.828420  76.310357  2018-02-01 00:06:03  Ezhupunna
4   95592  13.121986  77.610539  2018-02-01 00:08:54  Bengaluru

PS我注意到所需輸出的城市位置不正確。

PPS您還應注意，這可能需要一些時間，因為該函數每次需要查詢API

您還可以創建范圍更廣的定位功能，但是它會非常粗糙，並且可能覆蓋的區域太廣。 然后，您可以按照之前顯示的相同方式使用該功能

In [21]: def location(lat, long):
    ...:     if 9 <= lat < 10 and 76 <= long < 77:
    ...:         return 'Chennai'
    ...:     elif 13 <= lat < 14 and 77 <= long < 78:
    ...:         return 'Dehli'
    ...:     elif 28 <= lat < 29 and 77 <= long < 78:
    ...:         return 'Mumbai'
    ...:     elif 30 <= lat < 31 and 76 <= long < 77:
    ...:         return 'Bengaluru'
    ...:     

In [22]: df['city'] = np.vectorize(location)(df['latitude'], df['longitude'])

In [23]: df
Out[23]: 
   userId   latitude  longitude             dateTime       city
0  121165  30.314368  76.384381  2018-02-01 00:01:57  Bengaluru
1   95592  13.186810  77.643769  2018-02-01 00:02:17      Dehli
2  111435  28.512889  77.088154  2018-02-01 00:04:02     Mumbai
3  129532   9.828420  76.310357  2018-02-01 00:06:03    Chennai
4   95592  13.121986  77.610539  2018-02-01 00:08:54      Dehli

在熊貓數據框中通過緯度/經度值分配城市名稱

問題描述

1 個解決方案

解決方案1
0 已采納 2018-08-02 05:24:27

在熊貓數據框中通過緯度/經度值分配城市名稱

問題描述

1 個解決方案

解決方案1 0 已采納 2018-08-02 05:24:27

解決方案1
0 已采納 2018-08-02 05:24:27