scrapy 從 txt 文件讀取 url 失敗

Question

這就是 txt 文件的樣子，我從 jupiter notebook 打開它。 請注意，出於顯而易見的原因，我更改了結果中鏈接的名稱。 輸入 - - - - - - - - - - - - - - -

用 open('...\j.txt', 'r') as f: data = f.readlines()

打印（數據[0]）打印（類型（數據））

輸出 - - - - - - - - - - - - - - - - -

[' https://www.example.com/191186976.html ', ' https://www.example.com/191187171.html ']

現在我在我的 scrapy 腳本中寫了這些，當我運行它時它沒有 go 的鏈接。 相反，它顯示：錯誤：獲取啟動請求時出錯。

class abc(scrapy.Spider): name = "abc_article"

with open('j.txt' ,'r')as f4:
    url_c = f4.readlines()

u = url_c[0]    
start_urls = u

如果我寫了 u = ['example.html', 'example.html'] starting_url = u 那么它工作得很好。 我是 scrapy 的新手，所以我想問一下這里有什么問題？ 是閱讀方法還是我沒有注意到的其他東西。 謝謝。

Answer 1

這樣的事情應該讓你朝着正確的方向前進。

import csv
from urllib.request import urlopen
#import urllib2
from bs4 import BeautifulSoup

contents = []
with open('C:\\your_path_here\\test.csv','r') as csvf: # Open file in read mode
    urls = csv.reader(csvf)
    for url in urls:
        contents.append(url) # Add each url to list contents

for url in contents:  # Parse through each url in the list.
    page = urlopen(url[0]).read()
    soup = BeautifulSoup(page, "html.parser")
print(soup)

scrapy 從 txt 文件讀取 url 失敗

問題描述

1 個解決方案

解決方案1
0 2020-03-08 01:30:00

scrapy 從 txt 文件讀取 url 失敗

問題描述

1 個解決方案

解決方案1 0 2020-03-08 01:30:00

解決方案1
0 2020-03-08 01:30:00