简体   繁体   English

python web scraper-我做错了什么?

[英]python web scraper - what have I done wrong?

I am a newbie and am building a web scraper that will grab (and eventually export to csv) all the UK McDonalds addresses, postcodes and phone numbers. 我是新手,正在构建一个网络抓取工具,它将抓取(并最终导出到csv)所有英国麦当劳的地址,邮政编码和电话号码。 I am using an aggregator instead of the McDonalds website. 我使用的是汇总器,而不是麦当劳网站。

https://www.localstore.co.uk/stores/75639/mcdonalds-restaurant/ https://www.localstore.co.uk/stores/75639/mcdonalds-restaurant/

I have borrowed and repurposed some code: 我已经借用了一些代码,并重新设计了它们的用途:

from bs4 import BeautifulSoup
from urllib2 import urlopen

BASE_URL = "https://www.localstore.co.uk/stores/75639/mcdonalds-restaurant/"

def get_category_links(section_url):
    html = urlopen(section_url).read()
    soup = BeautifulSoup(html, "lxml")
    boccat = soup.find("tr")
    category_links = [BASE_URL + tr.a["href"] for tr in boccat.findAll("h2")]
    return category_links

def get_restaurant_details(category_url):
    html = urlopen(category_url).read()
    soup = BeautifulSoup(html, "lxml")
    streetAddress = soup.find("span", "streetAddress").string
    addressLocality = [h2.string for h2 in soup.findAll("span", "addressLocality")]
    addressRegion = [h2.string for h2 in soup.findAll("span", "addressRegion")]
    postalCode = [h2.string for h2 in soup.findAll("span", "postalCode")]
    phoneNumber = [h2.string for h2 in soup.findAll("td", "b")]
    return {"streetAddress": streetAddress,
            "addressLocality": addressLocality,
            "postalCode": postalCode,
            "addressRegion": addressRegion,
            "phoneNumber": phoneNumber}

I don't think I have grabbed the data - as when I run the following line: 我认为我没有抓取数据-就像我运行以下行时一样:

print(postalCode)

or 要么

print(addressLocality)

I get the following error 我收到以下错误

NameError: name 'postalCode' is not defined

any idea with what i'm doing wrong? 我做错了什么主意吗?

As others have commented, you need to actually call your functions first off. 正如其他人所评论的那样,您实际上需要首先调用函数。

Do something like this 做这样的事情

if __name__ == '__main__':
    res = "https://www.localstore.co.uk/store/329213/mcdonalds-restaurant/london/"
    print(get_restaurant_details(res)["postalCode"])

after your two functions. 您的两个功能之后。 I just went on the site and got a URL that would work for your program, but I never actually tested it. 我刚访问该网站,并获得了一个适用于您程序的URL,但是我从未真正对其进行过测试。 The main problem you have right now is that you aren't actually doing anything. 您现在遇到的主要问题是您实际上没有做任何事情。 You need to call a function! 您需要调用一个函数!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM