Find # of pages in a multipage table

Question

I'm trying to extract the # of pages in a multipage table URL

HTML=<span style="float:right">Page 1 of 63,917</span>

Need to extract 63917.

I used

soup = bsoup(r.text)
pages=re.findall(r"Page 1 of\s(.+)<\/span>", str(soup))
print(pages)

But the print(pages) returns a whole lot of HTML right till the end of the body

##'63,917</span></div><table class="table table-striped##

Why doesn't my regex work? And how do i extract only the # from the HTML response?

Answer 1

Your regex does not work because you are using greedy capture in your grouping parentheses (.+) . The way you have it written, the .+ is matching everything from Page 1 of\\s onward (until the last </span> tag in the document). You need to use non-greedy capture by adding a ? after the + , like this:

Page 1 of\s(.+?)<\/span>

Find # of pages in a multipage table

Question

1 answers

solution1
1 ACCPTED 2019-02-11 06:45:46

Find # of pages in a multipage table

Question

1 answers

solution1 1 ACCPTED 2019-02-11 06:45:46

solution1
1 ACCPTED 2019-02-11 06:45:46