[英]Insert space between specific characters but not if followed by specific characters regex
Using python regex, I wish to insert a space between alpha characters and numerals (alpha will always preceed numeral), but not between (numerals and hyphens) or between (numerals and underscores). 我希望使用python regex在Alpha字符和数字之间插入一个空格(Alpha始终以数字开头),但不要在(数字和连字符)之间(在数字和下划线之间)插入空格。
Ideally, I'd like it to replace all such examples on the line (see the 3rd sample string, below) , but even just doing the first one is great. 理想情况下,我希望它替换行上的所有此类示例(请参见下面的第三个示例字符串) ,但是即使仅执行第一个示例也很棒。
I've gotten this far: 我已经走了这么远:
import re
item = "Bob Ro1-1 Fred"
txt = re.sub(r"(.*)(\d)", r"\1 \2", item)
print(txt) #prints Bob Ro1 -1 Fred (DESIRED WOULD BE Bob Ro 1-1 Fred)
I've tried sticking a ?
我试过贴
?
in various places to ungreedify the search, but haven't yet found the magic. 在各个地方进行搜索,但是还没有找到魔术。
Sample strings:
Original ==> Desired output
示例字符串:
Original ==> Desired output
1. "Bob Ro1 Sam cl3"==>
"Bob Ro 1 Sam cl 3"1.“ Bob Ro1 Sam cl3”
==>
“ Bob Ro 1 Sam cl 3”
2. "Some Guy ro1-1 Sam"==>
"Some Guy ro 1-1 Sam"2.“ Some Guy ro1-1 Sam”
==>
“ Some Guy ro1-1 Sam”
3. "ribbet ribbit ro3_2 bob wow cl1-3"==>
"ribbit ribbit ro 3_2 bow wow cl 1-3"3.“ ribrib ribbit ro3_2 bob wow cl1-3”
==>
“ ribbit ribbit ro 3_2 bow wow cl1-3”
You may use 您可以使用
re.sub(r'([^\W\d_])(\d)', r'\1 \2', s)
See the regex demo 见正则表达式演示
A variation using lookarounds: 使用环视方法的一种变体 :
re.sub(r'(?<=[^\W\d_])(?=\d)', ' ', s)
The ([^\\W\\d_])(\\d)
regex matches and captures into Group 1 any single letter and into Group 2 the next digit. ([^\\W\\d_])(\\d)
正则表达式匹配并将任何单个字母捕获到组1中,并将下一个数字捕获到组2中。 Then, the \\1 \\2
replacement pattern inserts the letter in Group 1, a space, and the digit in Group 2 into the resulting string. 然后,
\\1 \\2
替换模式将组1中的字母,空格和组2中的数字插入到结果字符串中。
The (?<=[^\\W\\d_])(?=\\d)
matches a location in between a letter and a digit, and thus, the replacement string only contains a space. (?<=[^\\W\\d_])(?=\\d)
与字母和数字之间的位置匹配,因此,替换字符串仅包含一个空格。
See the Python demo : 参见Python演示 :
import re
strs = [ 'Bob Ro1-1 Fred', 'Bob Ro1 Sam cl3', 'Some Guy ro1-1 Sam', 'ribbet ribbit ro3_2 bob wow cl1-3' ]
rx = re.compile(r'([^\W\d_])(\d)')
for s in strs:
print(re.sub(r'([^\W\d_])(\d)', r'\1 \2', s))
print(re.sub(r'(?<=[^\W\d_])(?=\d)', ' ', s))
Output: 输出:
Bob Ro 1-1 Fred
Bob Ro 1-1 Fred
Bob Ro 1 Sam cl 3
Bob Ro 1 Sam cl 3
Some Guy ro 1-1 Sam
Some Guy ro 1-1 Sam
ribbet ribbit ro 3_2 bob wow cl 1-3
ribbet ribbit ro 3_2 bob wow cl 1-3
You need a look ahead following a look behind: 您需要先看后面,再看后面:
(?<=[a-zA-Z])(?=[0-9])
The code should be re.sub(r"(?<=[a-zA-Z])(?=[0-9])", r" ", item)
代码应为
re.sub(r"(?<=[a-zA-Z])(?=[0-9])", r" ", item)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.