简体   繁体   中英

python bs4, how to scrape this text in html?

the site url: https://n.news.naver.com/mnews/article/421/0006111920

I want to scrape "5" on the below html.

I used this code: soup.select_one('span.u_likeit_text._count').get_text()

the result is '추천'

html code

<span class="u_likeit_text _count num">5</span>

Main issue here that the count is dynamically generated by JavaScript and not present in response and so your soup .

You could use selenium to render the page like a browser will do and convert the driver.page_source to your BeautifulSoup object:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup
import time

driver = webdriver.Chrome(ChromeDriverManager().install())

driver.get("https://n.news.naver.com/mnews/article/421/0006111920")
time.sleep(3)

soup = BeautifulSoup(driver.page_source, 'html.parser')

soup.select_one('span.u_likeit_text._count').get_text()

Output:

8

You have to separate the classes using space, instead of connecting over dot.

from bs4 import BeautifulSoup

soup = BeautifulSoup("<span class='u_likeit_text _count num'>5</span>", 'html.parser')
print(soup)
seven_day = soup.find_all("span" , class_="u_likeit_text _count num")
print(seven_day[0].text)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM