简体   繁体   中英

python web scraper - what have I done wrong?

I am a newbie and am building a web scraper that will grab (and eventually export to csv) all the UK McDonalds addresses, postcodes and phone numbers. I am using an aggregator instead of the McDonalds website.

https://www.localstore.co.uk/stores/75639/mcdonalds-restaurant/

I have borrowed and repurposed some code:

from bs4 import BeautifulSoup
from urllib2 import urlopen

BASE_URL = "https://www.localstore.co.uk/stores/75639/mcdonalds-restaurant/"

def get_category_links(section_url):
    html = urlopen(section_url).read()
    soup = BeautifulSoup(html, "lxml")
    boccat = soup.find("tr")
    category_links = [BASE_URL + tr.a["href"] for tr in boccat.findAll("h2")]
    return category_links

def get_restaurant_details(category_url):
    html = urlopen(category_url).read()
    soup = BeautifulSoup(html, "lxml")
    streetAddress = soup.find("span", "streetAddress").string
    addressLocality = [h2.string for h2 in soup.findAll("span", "addressLocality")]
    addressRegion = [h2.string for h2 in soup.findAll("span", "addressRegion")]
    postalCode = [h2.string for h2 in soup.findAll("span", "postalCode")]
    phoneNumber = [h2.string for h2 in soup.findAll("td", "b")]
    return {"streetAddress": streetAddress,
            "addressLocality": addressLocality,
            "postalCode": postalCode,
            "addressRegion": addressRegion,
            "phoneNumber": phoneNumber}

I don't think I have grabbed the data - as when I run the following line:

print(postalCode)

or

print(addressLocality)

I get the following error

NameError: name 'postalCode' is not defined

any idea with what i'm doing wrong?

As others have commented, you need to actually call your functions first off.

Do something like this

if __name__ == '__main__':
    res = "https://www.localstore.co.uk/store/329213/mcdonalds-restaurant/london/"
    print(get_restaurant_details(res)["postalCode"])

after your two functions. I just went on the site and got a URL that would work for your program, but I never actually tested it. The main problem you have right now is that you aren't actually doing anything. You need to call a function!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM