简体   繁体   English

Python Web 刮有问题

[英]Python Web Scraping With Problems

I am using requests-HTML and beautiful to scrape a website, below is the code.我正在使用 requests-HTML 和 beautiful 来抓取网站,下面是代码。 The weird thing is I can get the text sometimes from the web when using print(soup.get_text()) and I get some random codes when using print(soup) - in the image attached.奇怪的是,我有时可以在使用 print(soup.get_text()) 时从 web 获取文本,而在使用 print(soup) 时我会得到一些随机代码 - 在附加的图像中。

session = HTMLSession()
r = session.get(url)
soup = bs(r.content, "html.parser")
print(soup.get_text())
#print(soup)

The program return this when I tried to look at the soup当我试图看汤时程序返回这个

I think the site is javascript protected..well try this..it might help我认为该站点受 javascript 保护..试试这个..它可能会有所帮助

import requests
from bs4 import BeautifulSoup

r = requests.get(url)
print(r.text)

#if you want the whole content you can just do slicing stuff on the response stored in r or rather just do it with bs4

soup = BeautifulSoup(r.text, "html.parser")
print(soup.text)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM