I am trying to webscrape data from https://www.mygov.in/covid-19 , using Selenium , but when I extract the digits, there raises a new problem. . The number indicate current value and value of how much it changed. eg: 3,81,74,366⬆54,229.
When I scrape I get the text as 3,81,74,36654,229. So how can I get the current value only, using Selenium Python ?
eg:
3,81,74,36654,229 to 3,81,74,366
10,79,894198 to 10,79,894
22,40,7200 to 22,40,720
Here's an extract of an HTML fragment from that page:
<p class="mid-wrap">8,43,56,092
<span class="data-up">39,477</span>
</p>
If you get the text for the p element, the return value will be merged with the span content.
Consider doing this:
for p in soup.select('p.mid-wrap'):
span = p.find('span')
if span:
spantext = span.getText()
print(spantext)
span.extract()
print(p.getText())
Output:
39,477
8,43,56,092
Assuming all numbers are bigger than one thousand and current value is the first thing in the string something like this should work
^.*?,\d{3}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.