简体   繁体   中英

Split digit using regex python

I am trying to webscrape data from https://www.mygov.in/covid-19 , using Selenium , but when I extract the digits, there raises a new problem.图片预览 . The number indicate current value and value of how much it changed. eg: 3,81,74,366⬆54,229.

When I scrape I get the text as 3,81,74,36654,229. So how can I get the current value only, using Selenium Python ?

eg:
3,81,74,36654,229 to 3,81,74,366
10,79,894198 to 10,79,894
22,40,7200 to 22,40,720

Here's an extract of an HTML fragment from that page:

<p class="mid-wrap">8,43,56,092
  <span class="data-up">39,477</span>
</p>

If you get the text for the p element, the return value will be merged with the span content.

Consider doing this:

for p in soup.select('p.mid-wrap'):
    span = p.find('span')
    if span:
        spantext = span.getText()
        print(spantext)
        span.extract()
    print(p.getText())

Output:

39,477
8,43,56,092

Assuming all numbers are bigger than one thousand and current value is the first thing in the string something like this should work

^.*?,\d{3}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM