简体   繁体   中英

Why does this list return identical values?

I'm trying to scrape http://www.virginiaequestrian.com/main.cfm?action=greenpages&GPType=8 for all of it's table values and putting the values in a list of lists. For some reason i can't seem to understand. Appending the info dict to the data list only puts in the one value 364 times(the length of the table). I printed each line and value separately in the loop and i know i am grabbing the right elements/value, but everything seems to break down when i try to put the values in the data list.

Can somebody please tell me when i'm doing wrong?

from bs4 import BeautifulSoup
import requests

r=requests.get('http://www.virginiaequestrian.com/main.cfm?action=greenpages&GPType=8')
soup=BeautifulSoup(r.content,'html5lib')

data = []
info = {}

tbl = soup.findAll('table')[2]
for tr in tbl.findAll('tr')[3:]:
    for td in tr.findAll('td')[0]:
        value= td.string
        info['Name']=value
    for td in tr.findAll('td')[1]:
        value= td.string
        info['City']=value
    for td in tr.findAll('td')[2]:
        value= td.string
        info['Phone']=value
    for td in tr.findAll('td')[3]:
        value = "http://www.virginiaequestrian.com/{}".format(td.a['href'])
        info['ListURL']=value
        data.append(info)
print data

Objects in python (like your info dict) uses references to their underlying data structures. What you are basicaly doing when calling data.append(info) is appending the same reference to the same dict over and over again.

What you can do is either (re)create your info dict at each iteration of the outmost for-loop:

for tr in tbl.findAll('tr')[3:]:
    info = {}
    ...

or append a copy of your dict into your list:

data.append(info.copy())

creating a new object each time.


You can also simplify your inner for-loops as iterating over one value is not really necessary:

for td in tr.findAll('td')[0]:
    value= td.string
    info['Name']=value
for td in tr.findAll('td')[1]:
    value= td.string
    info['City']=value
for td in tr.findAll('td')[2]:
    value= td.string
    info['Phone']=value
for td in tr.findAll('td')[3]:
    value = "http://www.virginiaequestrian.com/{}".format(td.a['href'])
    info['ListURL']=value

can become

name, city, phone, url = tr.findAll('td')[:4]
info['Name'] = name.string
info['City'] = city.string
info['Phone'] = phone.string
info['ListURL'] = "http://www.virginiaequestrian.com/{}".format(url.a['href'])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM