Python - How to extract a number from a bs4 output

Question

I am trying to get a price from a website using BeautifulSoup and so far I have managed to get:

<h2>£<!-- -->199.99</h2>

I just want to receive '£199.99' Is there a way to filter out the letters?

Thanks in advance

Answer 1

You will use get_text function with strip=True to clean if necessary

from bs4 import BeautifulSoup


html = '<h2>£<!-- -->199.99</h2>'
soup = BeautifulSoup(html,'html5lib')

result = soup.find('h2').get_text(strip=True)

print(result)
#£199.99

Answer 2

Use re?

import re

s = "<h2>£<!-- -->199.99</h2>"

rx_price = re.compile(r'([0-9.]+)')

content = re.sub(r'<.+?>', '', s)

print (f"£{rx_price.findall(content)[0]}")

Output:

£199.99

Python - How to extract a number from a bs4 output

Question

2 answers

solution1
0 2020-04-14 14:40:45

solution2
-1 2020-04-14 14:30:59

Python - How to extract a number from a bs4 output

Question

2 answers

solution1 0 2020-04-14 14:40:45

solution2 -1 2020-04-14 14:30:59

solution1
0 2020-04-14 14:40:45

solution2
-1 2020-04-14 14:30:59