This is how the txt file looks like, and I opened it from jupiter notebook. Notice that I changed the name of the links in the result for obvious reason. input-----------------------------
with open('...\j.txt', 'r')as f: data = f.readlines()
print(data[0]) print(type(data))
output---------------------------------
[' https://www.example.com/191186976.html ', ' https://www.example.com/191187171.html ']
Now I wrote these in my scrapy script, it didn't go for the links when I ran it. Instead it shows: ERROR: Error while obtaining start requests.
class abc(scrapy.Spider): name = "abc_article"
with open('j.txt' ,'r')as f4:
url_c = f4.readlines()
u = url_c[0]
start_urls = u
And if I wrote u = ['example.html', 'example.html'] starting_url = u then it works perfectly fine. I'm new to scrapy so I'd like to ask what is the problem here? Is it the reading method or something else I didn't notice. Thanks.
Something like this should get you going in the right direction.
import csv
from urllib.request import urlopen
#import urllib2
from bs4 import BeautifulSoup
contents = []
with open('C:\\your_path_here\\test.csv','r') as csvf: # Open file in read mode
urls = csv.reader(csvf)
for url in urls:
contents.append(url) # Add each url to list contents
for url in contents: # Parse through each url in the list.
page = urlopen(url[0]).read()
soup = BeautifulSoup(page, "html.parser")
print(soup)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.