scrapy 从 txt 文件读取 url 失败

Question

This is how the txt file looks like, and I opened it from jupiter notebook.这就是 txt 文件的样子，我从 jupiter notebook 打开它。 Notice that I changed the name of the links in the result for obvious reason.请注意，出于显而易见的原因，我更改了结果中链接的名称。 input-----------------------------输入 - - - - - - - - - - - - - - -

with open('...\j.txt', 'r')as f: data = f.readlines()用 open('...\j.txt', 'r') as f: data = f.readlines()

print(data[0]) print(type(data))打印（数据[0]）打印（类型（数据））

output---------------------------------输出 - - - - - - - - - - - - - - - - -

[' https://www.example.com/191186976.html ', ' https://www.example.com/191187171.html '] [' https://www.example.com/191186976.html ', ' https://www.example.com/191187171.html ']

Now I wrote these in my scrapy script, it didn't go for the links when I ran it.现在我在我的 scrapy 脚本中写了这些，当我运行它时它没有 go 的链接。 Instead it shows: ERROR: Error while obtaining start requests.相反，它显示：错误：获取启动请求时出错。

class abc(scrapy.Spider): name = "abc_article" class abc(scrapy.Spider): name = "abc_article"

with open('j.txt' ,'r')as f4:
    url_c = f4.readlines()

u = url_c[0]    
start_urls = u

And if I wrote u = ['example.html', 'example.html'] starting_url = u then it works perfectly fine.如果我写了 u = ['example.html', 'example.html'] starting_url = u 那么它工作得很好。 I'm new to scrapy so I'd like to ask what is the problem here?我是 scrapy 的新手，所以我想问一下这里有什么问题？ Is it the reading method or something else I didn't notice.是阅读方法还是我没有注意到的其他东西。 Thanks.谢谢。

Answer 1

Something like this should get you going in the right direction.这样的事情应该让你朝着正确的方向前进。

import csv
from urllib.request import urlopen
#import urllib2
from bs4 import BeautifulSoup

contents = []
with open('C:\\your_path_here\\test.csv','r') as csvf: # Open file in read mode
    urls = csv.reader(csvf)
    for url in urls:
        contents.append(url) # Add each url to list contents

for url in contents:  # Parse through each url in the list.
    page = urlopen(url[0]).read()
    soup = BeautifulSoup(page, "html.parser")
print(soup)

scrapy 从 txt 文件读取 url 失败

问题描述

1 个解决方案

解决方案1
0 2020-03-08 01:30:00

scrapy 从 txt 文件读取 url 失败

问题描述

1 个解决方案

解决方案1 0 2020-03-08 01:30:00

解决方案1
0 2020-03-08 01:30:00