Python Web 刮有问题

Question

I am using requests-HTML and beautiful to scrape a website, below is the code.我正在使用 requests-HTML 和 beautiful 来抓取网站，下面是代码。 The weird thing is I can get the text sometimes from the web when using print(soup.get_text()) and I get some random codes when using print(soup) - in the image attached.奇怪的是，我有时可以在使用 print(soup.get_text()) 时从 web 获取文本，而在使用 print(soup) 时我会得到一些随机代码 - 在附加的图像中。

session = HTMLSession()
r = session.get(url)
soup = bs(r.content, "html.parser")
print(soup.get_text())
#print(soup)

The program return this when I tried to look at the soup当我试图看汤时程序返回这个

Answer 1

I think the site is javascript protected..well try this..it might help我认为该站点受 javascript 保护..试试这个..它可能会有所帮助

import requests
from bs4 import BeautifulSoup

r = requests.get(url)
print(r.text)

#if you want the whole content you can just do slicing stuff on the response stored in r or rather just do it with bs4

soup = BeautifulSoup(r.text, "html.parser")
print(soup.text)

Python Web 刮有问题

问题描述

1 个解决方案

解决方案1
0 2020-08-14 04:36:02

Python Web 刮有问题

问题描述

1 个解决方案

解决方案1 0 2020-08-14 04:36:02

解决方案1
0 2020-08-14 04:36:02