简体   繁体   中英

Find # of pages in a multipage table

I'm trying to extract the # of pages in a multipage table URL

HTML=<span style="float:right">Page 1 of 63,917</span>

Need to extract 63917.

I used

soup = bsoup(r.text)
pages=re.findall(r"Page 1 of\s(.+)<\/span>", str(soup))
print(pages)

But the print(pages) returns a whole lot of HTML right till the end of the body

##'63,917</span></div><table class="table table-striped##

Why doesn't my regex work? And how do i extract only the # from the HTML response?

Your regex does not work because you are using greedy capture in your grouping parentheses (.+) . The way you have it written, the .+ is matching everything from Page 1 of\\s onward (until the last </span> tag in the document). You need to use non-greedy capture by adding a ? after the + , like this:

Page 1 of\s(.+?)<\/span>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM