[英]Python regex - substitute until certain character
I am looking to replace spaces with commas, but up to first /
and tried the following:我正在寻找用逗号替换空格,但直到第一个/
并尝试了以下操作:
import re
txt = "usera 28935 28876 0 Apr25 ? 00:07:20 /xxx/yyyy/foo/bar/zzzzz/Java/jdk-1.8.0_101/xxx/xxx -cp /xxx/yyyy/foo/bar/zzzzz"
rem = (re.sub(' +', ' ', txt)) # convert multiple spaces into single
print(re.sub(' ', ',', rem.lstrip()))
But the output is - inserts comma after every space!但是 output 是 - 在每个空格后插入逗号!
usera,28935,28876,0,Apr25,?,00:07:20,/xxx/yyyy/foo/bar/zzzzz/Java/jdk-1.8.0_101/xxx/xxx,-cp,/xxx/yyyy/foo/bar/zzzzz
Expected Output:预期 Output:
usera,28935,28876,0,Apr25,?,00:07:20,/xxx/yyyy/foo/bar/zzzzz/Java/jdk-1.8.0_101/xxx/xxx -cp /xxx/yyyy/foo/bar/zzzzz
ie comma should be applied until the first /
即逗号应该应用到第一个/
I have tried lookahead, lookbehind but unable to work this out.我已经尝试过前瞻,后瞻,但无法解决这个问题。 Could someone advise me on how to achieve this please?有人可以告诉我如何实现这一目标吗?
Whenever you have a problem like this, consider splitting before using a regex每当您遇到此类问题时,请考虑在使用正则表达式之前进行拆分
# split the text once at the first /
a, b = txt.split("/", 1)
# do the replacement in the first half
a = re.sub(" +", ",", a)
# join 'em back up
result = "{}/{}".format(a,b)
You can use lookbehind, but it needs to be variable length.您可以使用lookbehind,但它必须是可变长度。 So, you'll need third-party regex
module:因此,您需要第三regex
模块:
>>> import regex
>>> txt = "usera 28935 28876 0 Apr25 ? 00:07:20 /xxx/yyyy/foo/bar/zzzzz/Java/jdk-1.8.0_101/xxx/xxx -cp /xxx/yyyy/foo/bar/zzzzz"
>>> regex.sub(r'(?<!/.*) +', ',', txt)
'usera,28935,28876,0,Apr25,?,00:07:20,/xxx/yyyy/foo/bar/zzzzz/Java/jdk-1.8.0_101/xxx/xxx -cp /xxx/yyyy/foo/bar/zzzzz'
# or you can use \G
>>> regex.sub(r'\G([^/ ]*+) +', r'\1,', txt)
'usera,28935,28876,0,Apr25,?,00:07:20,/xxx/yyyy/foo/bar/zzzzz/Java/jdk-1.8.0_101/xxx/xxx -cp /xxx/yyyy/foo/bar/zzzzz'
The first one replaces spaces only if /
character is not present earlier in the string.仅当字符串中较早出现/
字符时,第一个替换空格。
The second one defines a sequence of other than space or /
characters followed by spaces to be matched as many times as possible from the start of the string.第二个定义了一个非空格或/
字符的序列,后跟空格,从字符串的开头尽可能多地匹配。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.