[英]Python issues parsing through a list from an imported csv
我拼湊了一個腳本,該腳本運行一個導入的 url 列表,並從 html 部分中獲取所有“p”標簽,該部分具有 class 的“holder”。 它有效,但它只查看導入的 CSV 中的第一個 url:
import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup
contents = []
with open('list.csv','r') as csvf: # Open file in read mode
urls = csv.reader(csvf)
for url in urls:
contents.append(url) # Add each url to list contents
for url in contents: # Parse through each url in the list.
page = urlopen(url[0]).read()
soup = BeautifulSoup(page, "lxml")
n = 0
for container in soup.find_all("section",attrs={'class': 'holder'}):
n += 1
print('==','Section',n,'==')
for paragraph in container.find_all("p"):
print(paragraph)
任何想法我如何讓它循環遍歷每個 url 而不是一個?
問題在於代碼的縮進。 正確的是:
contents = []
with open('list.csv','r') as csvf: # Open file in read mode
urls = csv.reader(csvf)
for url in urls:
contents.append(url) # Add each url to list contents
for url in contents: # Parse through each url in the list.
page = urlopen(url[0]).read()
soup = BeautifulSoup(page, "lxml")
n = 0
for container in soup.find_all("section",attrs={'class': 'holder'}):
n += 1
print('==','Section',n,'==')
for paragraph in container.find_all("p"):
print(paragraph)
否則,您從最后一個 URL 中提取“p”標簽的內容(前一個循環的最后一個值分配給soup
。
你必須for container in soup.find_all():
,嘗試這樣的事情:
import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup
with open('list.csv','r') as csvf: # Open file in read mode
urls = csv.reader(csvf)
for url in urls:
page = urlopen(url).read()
soup = BeautifulSoup(page, "lxml")
n = 0
for container in soup.find_all("section",attrs={'class': 'holder'}):
n += 1
print('==','Section',n,'==')
for paragraph in container.find_all("p"):
print(paragraph)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.