简体   繁体   English

执行我的课程搜寻器时遇到问题

[英]Trouble executing my class crawler

I'm completely newbie to python when it comes to scrape any web data using class. 在使用class刮取任何Web数据时,我是python的新手。 So, apology in advance for any serious mistake. 因此,对于任何严重错误,请提前道歉。 I've written a script to parse the text using a tag from wikipedia web site. 我编写了一个脚本,使用Wikipedia网站上a标签来解析文本。 I tried to write the code accurately from my level best but for some reason when i execute the code it throws error. 我试图最好地从我的水平准确地编写代码,但是由于某些原因,当我执行代码时会引发错误。 The code and the error I'm having are given below for your kind consideration. 下面给出的代码和错误是出于您的考虑。

The script: 剧本:

import requests
from lxml.html import fromstring

class TextParser(object):

    def __init__(self):
        self.link = 'https://en.wikipedia.org/wiki/Main_Page'
        self.storage = None

    def fetch_url(self):
        self.storage = requests.get(self.link).text

    def get_text(self):
        root = fromstring(self.storage)
        for post in root.cssselect('a'):
            print(post.text)

item = TextParser()
item.get_text()

The error: 错误:

Traceback (most recent call last):
  File "C:\Users\mth\AppData\Local\Programs\Python\Python35-32\testmatch.py", line 38, in <module>
    item.get_text()
  File "C:\Users\mth\AppData\Local\Programs\Python\Python35-32\testmatch.py", line 33, in get_text
    root = fromstring(self.storage)
  File "C:\Users\mth\AppData\Local\Programs\Python\Python35-32\lib\site-packages\lxml\html\__init__.py", line 875, in fromstring
    is_full_html = _looks_like_full_html_unicode(html)
TypeError: expected string or bytes-like object

You're executing the following two lines 您正在执行以下两行

item = TextParser()
item.get_text()

When you initialize TextParser , self.storage is equal to None. 初始化TextParserself.storage等于None。 When you execute the function get_text() it's still equal to None. 当执行函数get_text()时,它仍然等于None。 So that's why you get that error. 这就是为什么您会收到该错误的原因。

However, if you change it to the following. 但是,如果将其更改为以下内容。 self.storage should get populated with a string rather than being none. self.storage应该使用字符串而不是都不填充。

item = TextParser()
item.fetch_url()
item.get_text()

If you want to call the function get_text without calling fetch_url you can do it this way. 如果要调用函数get_text而不调用fetch_url,则可以通过这种方式进行。

def get_text(self):
    self.fetch_url()
    root = fromstring(self.storage)
    for post in root.cssselect('a'):
        print(post.text)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM