[英]Adding Columns to pandas dataframe & iterating through one of the columns
I have loaded in a dataframe with a number of columns, one of which includes an address. 我已在具有多个列的数据框中加载了数据,其中一个包含地址。 I'm using a python geocoder module to get lat/long for every address in this csv.
我正在使用python地理编码器模块来获取此csv中每个地址的经/纬度。
Pandas 熊猫
1) How do I add new columns? 1)如何添加新列? Should I add the columns as I iterrate through the rows, or should I add columns at the start?
我应该在遍历行时添加列,还是应该在开始时添加列?
2) In my code below, I am trying to iterate through every row in the data frame. 2)在下面的代码中,我试图遍历数据帧中的每一行。 For every row, I am performing the geocoder.google() method.
对于每一行,我都在执行geocoder.google()方法。 Column 16 of my csv/data frame contains an address.
我的csv /数据框的第16列包含一个地址。
How would I refer to that address column whilst iterating through all the rows? 在遍历所有行时如何引用该地址列? I get "IndexError: tuple index out of range" if I run the code as it is.
如果按原样运行代码,则会收到“ IndexError:元组索引超出范围”。
CSV CSV
3) The 2nd part of my code does a similar thing with the CSV modules. 3)我的代码的第二部分在CSV模块中做了类似的事情。 I read in a CSV, loop through every row and perform the geocoder method as said before.
我读了一个CSV文件,遍历每一行并执行地理编码器方法,如前所述。 The geocoder method returns a list of 2 values (2 coordinates - [XXXX,XXXX]).
地理编码器方法返回2个值的列表(2个坐标-[XXXX,XXXX])。 I am trying to write to the original rows and then two more columns with each of the two coordinates afterwards.
我试图写原始行,然后再写两个列,然后再写两个坐标。 I am getting "TypeError: can only concatenate list (not "float") to list"
我收到“ TypeError:只能将列表(而不是“浮动”)连接到列表”
import geocoder
import csv
import pandas as pd
import time
df = pd.read_csv("RSM100_1995.csv",header=None)
print(df.head())
for row in df.iterrows():
g = geocoder.google(row[16])
print(row[16],g.latlng)
time.sleep(2)
with open("RSM100_1995.csv","r") as f, open("RSM_GCTest.csv","w",newline='') as g:
rdr = csv.reader(f)
wtr = csv.writer(g)
for r in rdr:
gc = geocoder.google(str(r[16]))
print(r[16],gc.latlng)
wtr.writerow(r + gc.latlng[0]+gc.latlng[1])
time.sleep(2)
By the way, I am using time.sleep(2) since the geocoder has a limit to the number of requests. 顺便说一句,我使用time.sleep(2),因为地址解析器对请求的数量有限制。 I don't run the code as it is here, just put it like this to display it.
我不会像在这里那样运行代码,只是像这样显示它即可。
If anyone has a better way of geocoding UK addresses using Python, let me know. 如果有人有更好的方法使用Python对英国地址进行地理编码,请告诉我。
Edit: 编辑:
For Chirag - I've made the changes you mentioned. 对于Chirag-我已经进行了您提到的更改。 I've tried replacing 'Address' in the code below with the column index (which is 16) with the same result.
我试图用相同的结果将下面代码中的“地址”替换为列索引(即16)。
I've added column headers with X.columns 我用X.columns添加了列标题
I'm now getting a very long error message linking many different files. 我现在收到一个很长的错误消息,链接许多不同的文件。
RS1995 = pd.read_csv("RSM100_1995.csv",header=None)
RS1995.columns = ['ID','Price','Date','Postcode','X','Y','Z','PAON','SAON','Street','Locality','District','City','County','A','B','Address','XX']
print(RS1995.head())
for row in RS1995.iterrows():
RS1995['lat'] = geocoder.google(RS1995['Address']).latlng[0]
RS1995['lng'] = geocoder.google(RS1995['Address']).latlng[1]
print(RS1995.head())
time.sleep(2)
In terms of the CSV - there are 17 columns, i've titled them up above. 就CSV而言-有17列,我在上面列了标题。 The 'Address' column is the one I want to pass through the geocoder.
“地址”列是我要通过地址解析器传递的列。 The Address column itself is a concatenation of 'PAON', 'SAON', 'Street','Locality','County' & 'Postcode'.
“地址”列本身是“ PAON”,“ SAON”,“ Street”,“ Locality”,“ County”和“ Postcode”的串联。 I could've included 'City' too, but all the concatenation I did using the CSV module.
我本来也可以包含“城市”,但是我使用CSV模块所做的所有串联。
If it helps - here is the Geocoder link: 如果有帮助-这是Geocoder链接:
http://geocoder.readthedocs.io/ http://geocoder.readthedocs.io/
Edit 2: 编辑2:
RS1995 = pd.read_csv("RSM100_1995.csv",header=None)
RS1995.columns = ['ID','Price','Date','Postcode','X','Y','Z','PAON','SAON','Street','Locality','District','City','County','A','B','Address','XX']
print(RS1995.head())
RS1995['lat'] = "x"
RS1995['lng'] = "y"
print(RS1995.head())
for row in RS1995.iterrows():
print(row)
Whenever I do run this code above, I get this. 每当我在上面运行此代码时,我都会得到。 I've just taken the last two as an example.
我只是以最后两个为例。 What does this mean?
这是什么意思? How would I iterrate through every row, geocode the address and wait 2 seconds so I don't surpass the rate limit?:
我将如何遍历每一行,对地址进行地址解析并等待2秒,以便不超过速率限制?:
(98, ID {40E4DAC0-863F-42FE-94B4-49A70D3BE0B9}
Price 43000
Date 24/02/1995 00:00
Postcode WS12 3XJ
X S
Y N
Z F
PAON 1
SAON NaN
Street WOODFORD WAY
Locality HEATH HAYES
District CANNOCK
City CANNOCK CHASE
County STAFFORDSHIRE
A A
B A
Address 1 WOODFORD WAY HEATH HAYES STAFFORDSHIRE WS12...
XX 1 WOODFORD WAY HEATH HAYES STAFFORDSHIRE WS12...
lat x
lng y
Name: 98, dtype: object)
(99, ID {061625F8-82D5-43CF-A55F-4288979D31EC}
Price 42995
Date 01/09/1995 00:00
Postcode PO1 5AY
X T
Y N
Z F
PAON 67
SAON NaN
Street BYERLEY ROAD
Locality PORTSMOUTH
District PORTSMOUTH
City PORTSMOUTH
County PORTSMOUTH
A A
B A
Address 67 BYERLEY ROAD PORTSMOUTH PORTSMOUTH PO1 5AY
XX 67 BYERLEY ROAD PORTSMOUTH PORTSMOUTH PO1 5AY
lat x
lng y
Name: 99, dtype: object)
You can create new columns in a pandas dataframe similar to how you would use an associative array or dictionary. 您可以在pandas数据框中创建新列,类似于使用关联数组或字典的方式。 You can create two new columns for your latitude and longitude like so:
您可以像这样为纬度和经度创建两个新列:
df['lat'] = geocoder.google(df[16]).latlng[0]
df['lng'] = geocoder.google(df[16]).latlng[1]
Then you can write the entire dataframe to a csv: 然后,您可以将整个数据帧写入csv:
df.to_csv('RSM_GCTest.csv')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.