I have loaded in a dataframe with a number of columns, one of which includes an address. I'm using a python geocoder module to get lat/long for every address in this csv.
Pandas
1) How do I add new columns? Should I add the columns as I iterrate through the rows, or should I add columns at the start?
2) In my code below, I am trying to iterate through every row in the data frame. For every row, I am performing the geocoder.google() method. Column 16 of my csv/data frame contains an address.
How would I refer to that address column whilst iterating through all the rows? I get "IndexError: tuple index out of range" if I run the code as it is.
CSV
3) The 2nd part of my code does a similar thing with the CSV modules. I read in a CSV, loop through every row and perform the geocoder method as said before. The geocoder method returns a list of 2 values (2 coordinates - [XXXX,XXXX]). I am trying to write to the original rows and then two more columns with each of the two coordinates afterwards. I am getting "TypeError: can only concatenate list (not "float") to list"
import geocoder
import csv
import pandas as pd
import time
df = pd.read_csv("RSM100_1995.csv",header=None)
print(df.head())
for row in df.iterrows():
g = geocoder.google(row[16])
print(row[16],g.latlng)
time.sleep(2)
with open("RSM100_1995.csv","r") as f, open("RSM_GCTest.csv","w",newline='') as g:
rdr = csv.reader(f)
wtr = csv.writer(g)
for r in rdr:
gc = geocoder.google(str(r[16]))
print(r[16],gc.latlng)
wtr.writerow(r + gc.latlng[0]+gc.latlng[1])
time.sleep(2)
By the way, I am using time.sleep(2) since the geocoder has a limit to the number of requests. I don't run the code as it is here, just put it like this to display it.
If anyone has a better way of geocoding UK addresses using Python, let me know.
Edit:
For Chirag - I've made the changes you mentioned. I've tried replacing 'Address' in the code below with the column index (which is 16) with the same result.
I've added column headers with X.columns
I'm now getting a very long error message linking many different files.
RS1995 = pd.read_csv("RSM100_1995.csv",header=None)
RS1995.columns = ['ID','Price','Date','Postcode','X','Y','Z','PAON','SAON','Street','Locality','District','City','County','A','B','Address','XX']
print(RS1995.head())
for row in RS1995.iterrows():
RS1995['lat'] = geocoder.google(RS1995['Address']).latlng[0]
RS1995['lng'] = geocoder.google(RS1995['Address']).latlng[1]
print(RS1995.head())
time.sleep(2)
In terms of the CSV - there are 17 columns, i've titled them up above. The 'Address' column is the one I want to pass through the geocoder. The Address column itself is a concatenation of 'PAON', 'SAON', 'Street','Locality','County' & 'Postcode'. I could've included 'City' too, but all the concatenation I did using the CSV module.
If it helps - here is the Geocoder link:
http://geocoder.readthedocs.io/
Edit 2:
RS1995 = pd.read_csv("RSM100_1995.csv",header=None)
RS1995.columns = ['ID','Price','Date','Postcode','X','Y','Z','PAON','SAON','Street','Locality','District','City','County','A','B','Address','XX']
print(RS1995.head())
RS1995['lat'] = "x"
RS1995['lng'] = "y"
print(RS1995.head())
for row in RS1995.iterrows():
print(row)
Whenever I do run this code above, I get this. I've just taken the last two as an example. What does this mean? How would I iterrate through every row, geocode the address and wait 2 seconds so I don't surpass the rate limit?:
(98, ID {40E4DAC0-863F-42FE-94B4-49A70D3BE0B9}
Price 43000
Date 24/02/1995 00:00
Postcode WS12 3XJ
X S
Y N
Z F
PAON 1
SAON NaN
Street WOODFORD WAY
Locality HEATH HAYES
District CANNOCK
City CANNOCK CHASE
County STAFFORDSHIRE
A A
B A
Address 1 WOODFORD WAY HEATH HAYES STAFFORDSHIRE WS12...
XX 1 WOODFORD WAY HEATH HAYES STAFFORDSHIRE WS12...
lat x
lng y
Name: 98, dtype: object)
(99, ID {061625F8-82D5-43CF-A55F-4288979D31EC}
Price 42995
Date 01/09/1995 00:00
Postcode PO1 5AY
X T
Y N
Z F
PAON 67
SAON NaN
Street BYERLEY ROAD
Locality PORTSMOUTH
District PORTSMOUTH
City PORTSMOUTH
County PORTSMOUTH
A A
B A
Address 67 BYERLEY ROAD PORTSMOUTH PORTSMOUTH PO1 5AY
XX 67 BYERLEY ROAD PORTSMOUTH PORTSMOUTH PO1 5AY
lat x
lng y
Name: 99, dtype: object)
You can create new columns in a pandas dataframe similar to how you would use an associative array or dictionary. You can create two new columns for your latitude and longitude like so:
df['lat'] = geocoder.google(df[16]).latlng[0]
df['lng'] = geocoder.google(df[16]).latlng[1]
Then you can write the entire dataframe to a csv:
df.to_csv('RSM_GCTest.csv')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.