[英]Python: Web Scraping Weird Output
from bs4 import BeautifulSoup
from urllib.request import urlopen as uReq
import requests
url = 'https://en.wikisource.org/wiki/Main_Page'
r = requests.get(url)
Soup = BeautifulSoup(r.text, "html5lib")
List = Soup.find("div",class_="enws-mainpage-widget-content", id="enws-mainpage-newtexts-content").find_all('a')
ebooks=[]
i=0
for ebook in List:
x=ebook.get('title')
for ch in x:
if(ch==":"):
x=""
if x!="":
ebooks.append(x)
i=i+1
inputnumber=0
while inputnumber<len(ebooks):
print(inputnumber+1, " - ", ebooks[inputnumber])
inputnumber=inputnumber+1
input=int(input("Please select a book: "))
selectedbook = Soup.find("a", title=ebooks[input-1])
print(selectedbook['title'])
url1 = "https://en.wikisource.org/"+selectedbook['href']
print(url1)
r1 = requests.get(url1)
Soup1 = BeautifulSoup(r1.text, "html5lib")
List1 = Soup.find("div", class_="prp-pages-output")
print(List1)
This is my code.这是我的代码。 I want to get the paragraghs in the html code at the last part.我想在最后一部分获得 html 代码中的段落。 But as output I get:但作为 output 我得到:
1 - The Center of the Web
2 - Bobby Bumps Starts a Lodge
3 - May (Mácha)
4 - Animal Life and the World of Nature/1903/06/Notes and Comments
5 - The Czechoslovak Review/Volume 2/No Compromise
6 - She's All the World to Me
7 - Their One Love
Please select a book: 4
Animal Life and the World of Nature/1903/06/Notes and Comments
https://en.wikisource.org//wiki/Animal_Life_and_the_World_of_Nature/1903/06/Notes_and_Comments
None
Why is the List1 returns as one?为什么 List1 返回一个? It shouldn't.它不应该。 Can someone tell me where I am doing wrong.有人可以告诉我我在哪里做错了。
guess you just typo the Soup1 with Soup.猜你只是用 Soup 打错了 Soup1。 + I think you would need more then only one when you are looking for list of items so I added the find_all()
function. + 我认为您在查找项目列表时需要的不仅仅是一个,所以我添加了find_all()
function。
from bs4 import BeautifulSoup
from urllib.request import urlopen as uReq
import requests
url = "https://en.wikisource.org/wiki/Main_Page"
r = requests.get(url)
Soup = BeautifulSoup(r.text, "html5lib")
List = Soup.find(
"div", class_="enws-mainpage-widget-content", id="enws-mainpage-newtexts-content"
).find_all("a")
ebooks = []
i = 0
for ebook in List:
x = ebook.get("title")
for ch in x:
if ch == ":":
x = ""
if x != "":
ebooks.append(x)
i = i + 1
inputnumber = 0
while inputnumber < len(ebooks):
print(inputnumber + 1, " - ", ebooks[inputnumber])
inputnumber = inputnumber + 1
input = int(input("Please select a book: "))
selectedbook = Soup.find("a", title=ebooks[input - 1])
print(selectedbook["title"])
url1 = "https://en.wikisource.org/" + selectedbook["href"]
print(url1)
r1 = requests.get(url1)
Soup1 = BeautifulSoup(r1.text, "html5lib")
List1 = Soup1.find_all("div", class_="prp-pages-output")
print(List1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.