[英]Splitting string using backslash using regex
For a python program I have an input that I am taking from stdin
and the input is something like: 对于python程序,我有一个输入,我从
stdin
,输入是这样的:
"-------/--------\---------/------\"
When I print it out as a string value it is printed as it is. 当我将其打印为字符串值时,它将按原样打印。 I am trying to split the string into a list of strings based of forward and backward slash while keeping the separators as well.
我试图将字符串拆分为基于正向和反向斜线的字符串列表,同时保持分隔符。 I have used something like this:
我用过这样的东西:
re.split('(\\/)',string)
but the result that I get is: 但我得到的结果是:
['------' , '/' , '--------\\\\---------' , '/' , '---------\\\']
I was rather expecting it to be something like: 我更期待它是这样的:
['------' , '/' , '---------' , '\' , '---------', '/' , '---------' , '\']
What am I doing wrong here and how to solve this problem? 我在这里做错了什么以及如何解决这个问题?
To capture a delimiter, it's easier to use findall
instead of split
: 要捕获分隔符,使用
findall
而不是split
更容易:
re.findall(r'[^\\/]+|[\\/]', string)
[^\\\\/]+
would find 1 or more occurrences of sub-strings that do not contain forward or backward slash. [^\\\\/]+
会发现一个或多个不包含正斜杠或反斜杠的子串。 |
works as an or operator. 作为一个或运营商。 Finally,
[\\\\/]
will match with the occurrences of forward and backward slash. 最后,
[\\\\/]
将匹配前向和后向斜杠的出现。 The result would provide separate sub-strings for the occurrences of forward and backward slash and string matches where they do not occur. 结果将为前向和后向斜杠的出现提供单独的子字符串,并且不会出现字符串匹配。
As for why your code didn't work, your expression is (\\\\/)
. 至于为什么你的代码不起作用,你的表达式是
(\\\\/)
。 When Python interpreter parses this, it sees an escaped slash and creates a string of four characters: ( \\ / )
. 当Python解释器解析它时,它会看到一个转义的斜杠并创建一个包含四个字符的字符串:
( \\ / )
。 Then, this string is sent to the regex engine, which also does escaping. 然后,此字符串将发送到正则表达式引擎,该引擎也会进行转义。 It sees a slash followed by a backslash, and since backslash is not special, it "escapes" to itself, so the final expression is just
(/)
. 它看到一个斜杠后跟一个反斜杠,并且由于反斜杠并不特殊,它会“逃逸”到自身,因此最后的表达式只是
(/)
。 Finally, re applies this expression, splits by a backslash and captures it - exactly what you're observing. 最后,重新应用这个表达式,用反斜杠分割并捕获它 - 正是你正在观察的内容。
The correct command for your approach would be re.split('([\\\\\\/])',string)
due to double escaping. 由于双重转义,您的方法的正确命令将是
re.split('([\\\\\\/])',string)
。
The moral of the story: always use raw literals r"..."
with regexes to avoid double escaping issues. 故事的寓意:始终使用原始文字
r"..."
与正则表达式,以避免双重逃避问题。
I think, this solution gives exactly what you want: 我想,这个解决方案可以提供你想要的东西:
import re
testStr = '-------/--------\\---------/------\\'
parts = re.split('(\\\\|/)', testStr)
for p in parts:
print('p=' + p)
Result: 结果:
p=-------
p=/
p=--------
p=\
p=---------
p=/
p=------
p=\
p=
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.