I want to search a file for a number that match this pattern:
<a href="test/?n=451484" >
then get the number 451484
:
I use this pattern :
'
(test/?n=)
\d+
'
but that doesn't work ?
3 changes
escape the ?
wrap the d+
in paranthesis
drop paranthesis around test\\?n=
Example usage
>>> import re
>>> str='<a href="test/?n=451484" >'
>>> re.findall(r'test/\?n=(\d+)', str)
['451484']
To search for a literal ?
character, you need to escape it with a \\
. ?
is a special character in regexes, and cannot (usually) be used on its own.
pattern = r"test/\?n=(\d+)"
Alternatively, you can use specialized tools:
BeautifulSoup
) urlparse
to extract the url parameter value Example:
import re
from urlparse import urlparse, parse_qs
from bs4 import BeautifulSoup
data = """
<div>
<a href="test/?n=451484">link</a>
</div>
"""
soup = BeautifulSoup(data)
# filtering links with a specific "href" attribute value
link = soup.find('a', href=re.compile(r'test/\?n=\d+'))
url = link['href']
query = urlparse(url).query
print parse_qs(query)['n'][0] # prints 451484
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.