Python scrapy抓取鏈接而不抓取

Question

我是Scrapy Framework的新手，嘗試學習網絡抓取功能，我有一個包含網站頁面鏈接的txt文件，我制作了這些鏈接的列表，並將其存儲在start_urls中 ，但解析功能不起作用，因此它不抓取任何內容

這是代碼

try:
    import scrapy
except ImportError:
    print "\nERROR IMPORTING THE NESSASARY LIBRARIES\n"

#File with all the links
crimefile = open('links.txt', 'r')
#making a list with all the links
yourResult = [line for line in crimefile.readlines()]

class SpiderMan(scrapy.Spider):
    name = 'man spider'

    #making start_urls equal to that list
    start_urls = yourResult

    def parse(self, response):
        SET_SELECTOR = '.c411Listing.jsResultsList'
        for man in response.css(SET_SELECTOR):
            name = '.c411ListedName a ::text'
            address = '.adr ::text'
            phone = '.c411Phone ::text'
            yield { 

                    'NAME': man.css(name).extract_first(),
                    'ADDRESS': man.css(address).extract_first(),
                    'PHONE': man.css(phone).extract_first(),
                    }

廣告 是輸出，由於某種原因，解析功能不起作用，但抓取正在爬取每個鏈接

我究竟做錯了什么？ 用這個簡單的代碼？

Answer 1

問題是您的網址以“％0D％0A”結尾。 如果您將scrapy log中的URL之一輸入瀏覽器，則會顯示以下屏幕：

“輸入的郵政編碼格式錯誤。”

“％0D％0A”是您的URL文件中的換行符，這些換行符在加載文件並分成幾行時以某種方式保留。 刪除它們，您會沒事的。

輕松修復-添加對strip（）的調用：

yourResult = [line.strip() for line in crimefile.readlines()]

Python scrapy抓取鏈接而不抓取

問題描述

1 個解決方案

解決方案1
1 已采納 2017-04-25 19:39:07

Python scrapy抓取鏈接而不抓取

問題描述

1 個解決方案

解決方案1 1 已采納 2017-04-25 19:39:07

解決方案1
1 已采納 2017-04-25 19:39:07