简体   繁体   中英

Python and Regex - extracting a number from a string

I'm new to regex, and I'm starting to sort of get the hang of things. I have a string that looks like this:

This is a generated number #123 which is an integer.

The text that I've shown here around the 123 will always stay exactly the same, but it may have further text on either side. But the number may be 123, 597392, really one or more digits. I believe I can match the number and the folowing text using using \\d+(?= which is an integer.) , but how do I write the look-behind part?

When I try (?<=This is a generated number #)\\d+(?= which is an integer.) , it does not match using regexpal.com as a tester.

Also, how would I use python to get this into a variable (stored as an int)?

NOTE: I only want to find the numbers that are sandwiched in between the text I've shown. The string might be much longer with many more numbers.

You don't really need a fancy regex. Just use a group on what you want.

re.search(r'#(\d+)', 'This is a generated number #123 which is an integer.').group(1)

if you want to match a number in the middle of some known text, follow the same rule:

r'some text you know (\d+) other text you also know'
res = re.search('#(\d+)', 'This is a generated number #123 which is an integer.')

if res is not None:
    integer = int(res.group(1))

if you want to get the numbers only if the numbers are following text "This is a generated number #" AND followed by " which is an integer.", you don't have to do look-behind and lookahead. You can simply match the whole string, like:

"This is a generated number #(\d+) which is an integer."

I am not sure if I understood what you really want though. :)

updated

In [16]: a='This is a generated number #123 which is an integer.'                                                                        

In [17]: b='This should be a generated number #123 which could be an integer.'

In [18]: exp="This is a generated number #(\d+) which is an integer."

In [19]: result =re.search(exp, a)                                                                                                       

In [20]: int(result.group(1))
Out[20]: 123

In [21]: result = re.search(exp,b)

In [22]: result == None
Out[22]: True

You can just use the findall() in the re module.

string="This is a string that contains #134534 and other things"
match=re.findall(r'#\d+ .+',string);
print match

Output would be '#1234534 and other things'

This will match any length number #123 or #123235345 then a space then the rest of the line till it hits a newline char.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM