简体   繁体   中英

Geocoding, iterrows() and itertuples do not get the job done for a larger DataFrame

Im trying to add coördinates to a set of addresses that are saved in an excel file using the google geocoder API. See code below:

for i, row in df.iterrows():

    #below combines the address columns together in one variable, to push to the geocoder API.
    apiAddress = str(df.at[i, 'adresse1']) + ',' + str(df.at[i, 'postnr']) + ',' + str(df.at[i, 'By']) 
    
    #below creates a dictionary with the API key and the address info, to push to the Geocoder API on each iteration
    parameters = {
        'key' : API_KEY,
        'address' : apiAddress
        }
    #response from the API, based on the input url + the dictionary above.
    response = requests.get(base_url, params = parameters).json() 
    #when you look at the response, it is given as a dictionary. with this command I access the geometry part of the dictionary.
    geometry = response['results'][0]['geometry']
    
    #within the geometry party of the dictionary given by the API, I access the lat and lng respectively.
    lat = geometry['location']['lat'] 
    lng = geometry['location']['lng']
    
    #here I append the lat / lng to a new column in the dataframe for each iteration.
    df.at[i, 'Geo_Lat_New'] = lat
    df.at[i, 'Geo_Lng_New'] = lng


#printing the first 10 rows.    
print(df.head(10))

the above code works perfectly fine for 20 addresses. But when I try to run it on the entire dataset of 90000 addresses; using iterrows() I get a IndexError:

  File "C:\Users\...", line 29, in <module>
    geometry = response['results'][0]['geometry']

IndexError: list index out of range

Using itertuples() instead, with:

for i, row in df.itertuples():

I get a ValueError:


  File "C:\Users\...", line 22, in <module>
    for i, row in df.itertuples():

ValueError: too many values to unpack (expected 2)

when I use:

for i in df.itertuples():

I get a complicated KeyError. That is to long to put here.

Any suggestions on how to properly add coördinates for each address in the entire dataframe?

Update, in the end I found out what the issue was. The google geocoding API only handles 50 request per second. Therefore I used to following code to take a 1 second break after every 49 requests:

if count == 49:
        print('Taking a 1 second break, total count is:', total_count)
        time.sleep(1)
        count = 0

Where count keeps count of the number of loops, as soon as it hits 49, the IF statement above is executed, taking a 1 second break and resetting the count back to zero.

Although you have already found the error - Google API limits the amount of requests that can be done - it isn't usually good practice to use for with pandas . Therefore, I would re write your code to take advantage of pd.DataFrame.apply .

def get_geometry(row: pd.Series, API_KEY: str, base_url: str, tries: int = 0):

    apiAddress = ",".join(row["adresse1"], row["postnr"], row["By"])

    parameters = {"key": API_KEY, "address": apiAddress}

    try:
        response = requests.get(base_url, params = parameters).json()
        geometry = response["results"][0]["geometry"]
    except IndexError: # reach limit
        # sleep to make the next 50 requests, but
        # beware that consistently reaching limits could
        # further limit sending requests.
        # this is why you might want to keep track of how
        # many tries you have already done, as to stop the process
        # if a threshold has been met
        if tries > 3: # tries > arbitrary threshold
            raise
        time.sleep(1)
        return get_geometry(row, API_KEY, base_url, tries + 1)
    else: 
        geometry = response["results"][0]["geometry"]
        return geometry["location"]["lat"], geometry["location"]["lng"]

# pass kwargs to apply function and iterate over every row
lat_lon = df.apply(get_geometry, API_KEY = API_KEY, base_url = base_url, axis = 1)

df["Geo_Lat_New"] = lat_lon.apply(lambda latlon: latlon[0])
df["Geo_Lng_New"] = lat_lon.apply(lambda latlon: latlon[1])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM