繁体   English   中英

嘿,我是编码新手,刚刚遇到一个我无法弄清楚的错误,所以请帮帮我

[英]Hey! I am new to coding and just got an error that I can't figure out, so please help me out

这是代码,请查看它并告诉我我在此中犯的错误这段代码是为使用 python 抓取网页而编写的

from email import header
import random
import time
import urllib.request
from bs4 import BeautifulSoup
import requests

main_url = "http://www.google.com"

main_page_html  = tryAgain(main_url)
main_page_soup = BeautifulSoup(main_page_html)

# Scrape all TDs from TRs inside Table
for tr in main_page_soup.select("table.class_of_table"):
   for td in tr.select("td#id"):
       print(td.text)
       # For acnhors inside TD
       print(td.select("a")[0].text)
       # Value of Href attribute
       print(td.select("a")[0]["href"])

# This is method that scrape URL and if it doesnt get scraped, waits for 20 seconds and then tries again. (I use it because my internet connection sometimes get disconnects)

def tryAgain(passed_url):
    try:
        page  = requests.get(passed_url,headers = random.choice(header), timeout = timeout_time).text
        return page
    except Exception:
        while 1:
            print("Trying again the URL:")
            print(passed_url)
            try:
                page  = requests.get(passed_url,headers = random.choice(header), timeout = timeout_time).text
                print("-------------------------------------")
                print("---- URL was successfully scraped ---")
                print("-------------------------------------")
                return page
            except Exception:
                time.sleep(20)
                continue 

这是我得到的错误

我期望它能正常运行,因为我没有注意到我这边有任何错误,请告诉我是否有任何错误

你可以这样想,python 与一些编译语言相反,从上到下逐行解释事物。 因此,当它执行某行时(它是一个简化但)只有上面的行存在于 python 中。 话虽如此,如果你运行这个会发生什么?

from email import header
import random
import time
import urllib.request
from bs4 import BeautifulSoup
import requests

main_url = "http://www.google.com"

main_page_html  = tryAgain(main_url)

当然它说tryAgain未定义。 您需要将执行移动到定义下方(或定义在执行上方)。

Python 是一种脚本语言。 在调用它们之前始终定义您的方法/类。

当您的代码执行在以下行时

main_page_html  = tryAgain(main_url)

python 找不到方法“tryAgain”,因为它稍后在代码中定义。

改为这样做:

from email import header
import random
import time
import urllib.request
from bs4 import BeautifulSoup
import requests

main_url = "http://www.google.com"

# Scrape all TDs from TRs inside Table
for tr in main_page_soup.select("table.class_of_table"):
   for td in tr.select("td#id"):
       print(td.text)
       # For acnhors inside TD
       print(td.select("a")[0].text)
       # Value of Href attribute
       print(td.select("a")[0]["href"])

# This is method that scrape URL and if it doesnt get scraped, waits for 20 seconds and then tries again. (I use it because my internet connection sometimes get disconnects)

def tryAgain(passed_url):
    try:
        page  = requests.get(passed_url,headers = random.choice(header), timeout = timeout_time).text
        return page
    except Exception:
        while 1:
            print("Trying again the URL:")
            print(passed_url)
            try:
                page  = requests.get(passed_url,headers = random.choice(header), timeout = timeout_time).text
                print("-------------------------------------")
                print("---- URL was successfully scraped ---")
                print("-------------------------------------")
                return page
            except Exception:
                time.sleep(20)
                continue 


main_page_html  = tryAgain(main_url)
main_page_soup = BeautifulSoup(main_page_html)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM