[英]Using Python and Regex get last occurrence and remaining part
我正在嘗試使用 python 和正則表達式來獲取文件名(字符串)中的最后一組整數,該方法可以滿足我的需要,但是我還想返回正則表達式的逆向或剩余部分。 我怎樣才能做到這一點?
這是正則表達式([0-9]+|#+)(?..*([0-9]+|#+))
import re
values = [
'image.0001',
'image###',
'###image###',
'image001',
'image_001',
'001',
'0001.image',
'001image',
'001_image',
'image',
'01_image01',
'03_image01',
]
pattern = '([0-9]+|#+|@+)'
regex = '{0}(?!.*{0})'.format(pattern)
for v in values:
result = re.search(regex, v)
if result:
print result.groups()
目前它正在返回.... ('01', None)
我希望它返回類似('image', '0001')
更新
或者有一種方法可以按數字組拆分字符串...例如
'image.0001' > ['image.', '0001']
'image###' > ['image', '###']
'###image###' > ['###', 'image', '###']
'image001' > ['image', '001']
'image_001' > ['image_', '001']
'001' > ['001']
'0001.image' > ['0001', '.image']
'001image' > ['001', 'image']
'001_image' > ['001', '_image']
'image' > ['image']
'01_image01' > ['01', '_image', '01']
'03_image01' > ['03', '_image', '01']
編輯:
利用
re.findall(r'\d+|#+|@+|[^#@\d]+', v)
見證明。
解釋
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
#+ '#' (1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
@+ '@' (1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
[^#@\d]+ any character except: '#', '@', digits (0-
9) (1 or more times (matching the most
amount possible))
原始:使用re.split
,添加捕獲組以將捕獲的部分保留在結果中:
import re
values = [
'image.0001',
'image###',
'###image###',
'image001',
'image_001',
'001',
'0001.image',
'001image',
'001_image',
'image',
'01_image01',
'03_image01',
]
pattern = '[0-9]+|#+|@+'
regex = re.compile(r'({0})(?!.*(?:{0}))'.format(pattern))
for v in values:
print(regex.split(v))
結果:
['image.', '0001', '']
['image', '###', '']
['###image', '###', '']
['image', '001', '']
['image_', '001', '']
['', '001', '']
['', '0001', '.image']
['', '001', 'image']
['', '001', '_image']
['image']
['01_image', '01', '']
['03_image', '01', '']
請參閱正則表達式證明。
解釋
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
#+ '#' (1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
@+ '@' (1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
#+ '#' (1 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
@+ '@' (1 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
) end of look-ahead
import re
values = [
'image.0001',
'image###',
'###image###',
'image001',
'image_001',
'001',
'0001.image',
'001image',
'001_image',
'image',
'01_image01',
'03_image01',
]
for v in values:
print (re.sub(r"[^A-Za-z]+","",v), end = " ")
print (re.sub(r"(.+[_.]){0,1}[^0-9]+","",v))
Output:
image 0001
image
image
image 001
image 001
001
image
image 001
image
image
image 01
image 01
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.