[英]regex- remove the unwanted substring after second occurrence of hyphen in Python
Below are the strings out of which I need to pull out the meaningful IDs下面是我需要从中提取有意义的 ID 的字符串
'12345-1-abcde-aBCD'
'123-Abcdefghi abcdefghijkl'
'1234567-1-AB-ABC A/1 ABC (AB1234)'
'12345-ABC-Abcdefghijkl'
'123456-Abcdefgh'
'12345-AB1CDE'
Regex should match to all the above criteria and pass for all the cases to give below output正则表达式应符合上述所有标准并通过所有案例以给出以下 output
12345-1
123
1234567-1
12345
123456
12345
Regex should omit the part from the -hyphen if there are letters.如果有字母,Regex 应该省略 -hyphen 中的部分。
You can do this:你可以这样做:
import re
l = ['12345-1-abcde-aBCD',
'123-Abcdefghi abcdefghijkl',
'1234567-1-AB-ABC A/1 ABC (AB1234)',
'12345-ABC-Abcdefghijkl',
'123456-Abcdefgh',
'12345-AB1CDE',]
In [10]: for s in l:
...: print(re.match(r'^(\d+[-]?\d+?)',s))
...:
<re.Match object; span=(0, 7), match='12345-1'>
<re.Match object; span=(0, 3), match='123'>
<re.Match object; span=(0, 9), match='1234567-1'>
<re.Match object; span=(0, 5), match='12345'>
<re.Match object; span=(0, 6), match='123456'>
<re.Match object; span=(0, 5), match='12345'>
If you can have multiple hyphens with subsequent digits you can do something like:如果您可以有多个连字符和后续数字,您可以执行以下操作:
l = ['12345-1-abcde-aBCD',
'123-Abcdefghi abcdefghijkl',
'1234567-1-AB-ABC A/1 ABC (AB1234)',
'12345-ABC-Abcdefghijkl',
'123456-Abcdefgh',
'12345-AB1CDE',
'12345-1-1-ABC',
'1-2-3-4-5-A-B-C-D-E-F-/-(AB12345)0',
'12345-1A Abcd',]
In [31]: for s in l:
...: match = re.match(r'^([\d|-]*)(?![A-Za-z])',s)
...: print(match.group(0).rstrip('-'))
...:
12345-1
123
1234567-1
12345
123456
12345
12345-1-1
1-2-3-4-5
12345
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.