简体   繁体   中英

Extract text from html string with beautiful soup

I write the following code to extract price from webpage:

from urllib.request import urlopen
from bs4 import BeautifulSoup
url = "https://www.teleborsa.it/azioni/intesa-sanpaolo-isp-it0000072618-SVQwMDAwMDcyNjE4"
html = urlopen(url)
soup = BeautifulSoup(html,'lxml')
prize = soup.select('.h-price')
print(prize)

output is:

<span class="h-price fc0" id="ctl00_phContents_ctlHeader_lblPrice">1,384</span>

i want to extract 1,384 value.

Try this

document.getElementById("ctl00_phContents_ctlHeader_lblPrice").innerText

Or if you are having dynamic elements, you can iterate over each element and get innerText from it.

You can use .text property to get the desired text.

For example:

from urllib.request import urlopen
from bs4 import BeautifulSoup
url = "https://www.teleborsa.it/azioni/intesa-sanpaolo-isp-it0000072618-SVQwMDAwMDcyNjE4"
html = urlopen(url)
soup = BeautifulSoup(html,'lxml')
prize = soup.select_one('.h-price') # <- change to .select_one() to get only one element
print(prize.text)                   # <- use the .text property to get text of the tag

Prints:

1,384

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM