[英]Why does this list return identical values?
I'm trying to scrape http://www.virginiaequestrian.com/main.cfm?action=greenpages&GPType=8 for all of it's table values and putting the values in a list of lists. 我正在尝试为所有表值抓取http://www.virginiaequestrian.com/main.cfm?action=greenpages&GPType=8并将这些值放在列表中。 For some reason i can't seem to understand.
由于某种原因,我似乎无法理解。 Appending the info dict to the data list only puts in the one value 364 times(the length of the table).
将info dict追加到数据列表后,只能放入一个364倍(表的长度)的值。 I printed each line and value separately in the loop and i know i am grabbing the right elements/value, but everything seems to break down when i try to put the values in the data list.
我在循环中分别打印了每一行和每个值,我知道我正在获取正确的元素/值,但是当我尝试将值放入数据列表时,一切似乎都崩溃了。
Can somebody please tell me when i'm doing wrong? 当我做错事时可以告诉我吗?
from bs4 import BeautifulSoup
import requests
r=requests.get('http://www.virginiaequestrian.com/main.cfm?action=greenpages&GPType=8')
soup=BeautifulSoup(r.content,'html5lib')
data = []
info = {}
tbl = soup.findAll('table')[2]
for tr in tbl.findAll('tr')[3:]:
for td in tr.findAll('td')[0]:
value= td.string
info['Name']=value
for td in tr.findAll('td')[1]:
value= td.string
info['City']=value
for td in tr.findAll('td')[2]:
value= td.string
info['Phone']=value
for td in tr.findAll('td')[3]:
value = "http://www.virginiaequestrian.com/{}".format(td.a['href'])
info['ListURL']=value
data.append(info)
print data
Objects in python (like your info
dict) uses references to their underlying data structures. python中的对象(如您的
info
字典)使用对其底层数据结构的引用。 What you are basicaly doing when calling data.append(info)
is appending the same reference to the same dict over and over again. 您在调用
data.append(info)
时所做的基本工作是一次又一次地将相同引用附加到相同字典上。
What you can do is either (re)create your info
dict at each iteration of the outmost for-loop: 您可以做的是在最外层for循环的每次迭代中(重新)创建
info
字典:
for tr in tbl.findAll('tr')[3:]:
info = {}
...
or append a copy of your dict into your list: 或将字典的副本添加到列表中:
data.append(info.copy())
creating a new object each time. 每次创建一个新对象。
You can also simplify your inner for-loops as iterating over one value is not really necessary: 您也可以简化内部for循环,因为实际上不需要遍历一个值:
for td in tr.findAll('td')[0]:
value= td.string
info['Name']=value
for td in tr.findAll('td')[1]:
value= td.string
info['City']=value
for td in tr.findAll('td')[2]:
value= td.string
info['Phone']=value
for td in tr.findAll('td')[3]:
value = "http://www.virginiaequestrian.com/{}".format(td.a['href'])
info['ListURL']=value
can become 可以变成
name, city, phone, url = tr.findAll('td')[:4]
info['Name'] = name.string
info['City'] = city.string
info['Phone'] = phone.string
info['ListURL'] = "http://www.virginiaequestrian.com/{}".format(url.a['href'])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.