[英]Is there a simple way to readlines from text file to this beautiful soup lib python script?
如何將 txt.file 中的行讀入此腳本,而不必在腳本中列出 url? 謝謝
from bs4 import BeautifulSoup
import requests
url = "http://www.url1.com"
response = requests.get(url)
data = response.text
soup = BeautifulSoup(data, 'html.parser')
categories = soup.find_all("a", {"class":'navlabellink nvoffset nnormal'})
for category in categories:
print(url + "," + category.text)
我的 text.file 內容有一個換行符的分隔符:
http://www.url1.com
http://www.url2.com
http://www.url3.com
http://www.url4.com
http://www.url5.com
http://www.url6.com
http://www.url7.com
http://www.url8.com
http://www.url9.com
file1 = open('text.file', 'r')
Lines = file1.readlines()
count = 0
# Strips the newline character
for line in Lines:
print("Line{}: {}".format(count, line.strip()))
你只需用 url 變量替換你的行
要從a.txt
讀取 URL,您可以使用以下腳本:
import requests
from bs4 import BeautifulSoup
with open('a.txt', 'r') as f_in:
for line in map(str.strip, f_in):
if not line:
continue
response = requests.get(line)
data = response.text
soup = BeautifulSoup(data, 'html.parser')
categories = soup.find_all("a", {"class":'navlabellink nvoffset nnormal'})
for category in categories:
print(url + "," + category.text)
為了這個示例,假設您的文件名為urls.txt
。 在 Python 中,打開文件並讀取其內容非常容易。
with open('urls.txt', 'r') as f:
urls = f.read().splitlines()
#Your list of URLs is now in the urls list!
'urls.txt'
后面的'r'
' 只是告訴 Python 以閱讀模式打開文件。 如果您不需要修改文件,最好以只讀模式打開它。 f.read() 返回文件的全部內容,但它包含換行符 ( \n
),因此splitlines()
將刪除這些字符並為您創建一個列表。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.