简体   繁体   中英

Regular expression to separate out the last occurring number using Python

I have a regular expression which separates out the number from the given string.

username = "testuser1"
xp = r'^\D+'
ma = re.match(xp, username)
user_prefix = ma.group(0)
print user_prefix

output is

testuser

But if the username is something like below

username = "testuser1-1"

I am getting the following output

testuser

which is expected. But I am looking for the following

testuser1-

Basically the regular expression should separate out the last occurring whole number (not individual digits).

Summary is

input = "testuser1"
>>> output = testuser
input = "testuser1-1"
>>> output = testuser1-
input = "testuser1-2000"
>>> output = testuser1-

Can I have a single regular expression to deal with the above all cases..?

You can use re.sub and look behind syntax:

re.sub(r'(?<=\D)\d+$', '', username)

A shorter version:

re.sub(r'\d+$', '', username)

The sub function is more suited for this case.

Test cases:

re.sub(r'\d+$', '', "testuser1-100")
# 'testuser1-'

re.sub(r'\d+$', '', "testuser1-1")
# 'testuser1-'

re.sub(r'\d+$', '', "testuser1")
# 'testuser'

A solution using re.match:

import re
username = "testuser1"
xp = r'^(.+?)\d+$'
ma = re.match(xp, username)
user_prefix = ma.groups()[0]
user_prefix
# 'testuser'

# you can also capture the last number
xp = r'^(.+?)(\d+)$'
ma = re.match(xp, username)
user_prefix, user_number = ma.groups()
user_prefix, user_number
# ('testuser', '1')

print re.match(xp, "testuser1-2000").groups()
# ('testuser1-', '2000')
re.match(xp, "testuser1-2000").groups()[0]
# 'testuser1-'
re.match(xp, "testuser1-2000").group(1)
# 'testuser1-'

这里!

regex_ = '\w+-?(?:\d+)?' 

涉及正则表达式的引擎较少(考虑-作为唯一的标记)

^([^\s-]+-|\D+)

我建议从最后开始,删除每个字符并停在第一个非数字。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM