[英]python bs4, how to scrape this text in html?
the site url: https://n.news.naver.com/mnews/article/421/0006111920网站网址: https ://n.news.naver.com/mnews/article/421/0006111920
I want to scrape "5" on the below html.我想在下面的 html 上刮掉“5”。
I used this code: soup.select_one('span.u_likeit_text._count').get_text()我使用了这个代码:soup.select_one('span.u_likeit_text._count').get_text()
the result is '추천'结果是'추천'
html code html代码
<span class="u_likeit_text _count num">5</span>
Main issue here that the count is dynamically generated by JavaScript
and not present in response
and so your soup
.这里的主要问题是计数是由
JavaScript
动态生成的,而不是在response
中出现,所以你的soup
。
You could use selenium
to render the page like a browser will do and convert the driver.page_source
to your BeautifulSoup
object:您可以使用
selenium
像浏览器一样呈现页面,并将driver.page_source
转换为BeautifulSoup
对象:
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup
import time
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get("https://n.news.naver.com/mnews/article/421/0006111920")
time.sleep(3)
soup = BeautifulSoup(driver.page_source, 'html.parser')
soup.select_one('span.u_likeit_text._count').get_text()
Output:输出:
8
You have to separate the classes using space, instead of connecting over dot.您必须使用空格分隔类,而不是通过点连接。
from bs4 import BeautifulSoup
soup = BeautifulSoup("<span class='u_likeit_text _count num'>5</span>", 'html.parser')
print(soup)
seven_day = soup.find_all("span" , class_="u_likeit_text _count num")
print(seven_day[0].text)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.