简体   繁体   English

使用正则表达式使用反斜杠拆分字符串

[英]Splitting string using backslash using regex

For a python program I have an input that I am taking from stdin and the input is something like: 对于python程序,我有一个输入,我从stdin ,输入是这样的:

"-------/--------\---------/------\"

When I print it out as a string value it is printed as it is. 当我将其打印为字符串值时,它将按原样打印。 I am trying to split the string into a list of strings based of forward and backward slash while keeping the separators as well. 我试图将字符串拆分为基于正向和反向斜线的字符串列表,同时保持分隔符。 I have used something like this: 我用过这样的东西:

re.split('(\\/)',string)

but the result that I get is: 但我得到的结果是:

['------' , '/' , '--------\\\\---------' , '/' , '---------\\\']

I was rather expecting it to be something like: 我更期待它是这样的:

['------' , '/' , '---------' , '\' , '---------', '/' , '---------' , '\']

What am I doing wrong here and how to solve this problem? 我在这里做错了什么以及如何解决这个问题?

To capture a delimiter, it's easier to use findall instead of split : 要捕获分隔符,使用findall而不是split更容易:

re.findall(r'[^\\/]+|[\\/]', string)

[^\\\\/]+ would find 1 or more occurrences of sub-strings that do not contain forward or backward slash. [^\\\\/]+会发现一个或多个不包含正斜杠或反斜杠的子串。 | works as an or operator. 作为一个或运营商。 Finally, [\\\\/] will match with the occurrences of forward and backward slash. 最后, [\\\\/]将匹配前向和后向斜杠的出现。 The result would provide separate sub-strings for the occurrences of forward and backward slash and string matches where they do not occur. 结果将为前向和后向斜杠的出现提供单独的子字符串,并且不会出现字符串匹配。

As for why your code didn't work, your expression is (\\\\/) . 至于为什么你的代码不起作用,你的表达式是(\\\\/) When Python interpreter parses this, it sees an escaped slash and creates a string of four characters: ( \\ / ) . 当Python解释器解析它时,它会看到一个转义的斜杠并创建一个包含四个字符的字符串: ( \\ / ) Then, this string is sent to the regex engine, which also does escaping. 然后,此字符串将发送到正则表达式引擎,该引擎也会进行转义。 It sees a slash followed by a backslash, and since backslash is not special, it "escapes" to itself, so the final expression is just (/) . 它看到一个斜杠后跟一个反斜杠,并且由于反斜杠并不特殊,它会“逃逸”到自身,因此最后的表达式只是(/) Finally, re applies this expression, splits by a backslash and captures it - exactly what you're observing. 最后,重新应用这个表达式,用反斜杠分割并捕获它 - 正是你正在观察的内容。

The correct command for your approach would be re.split('([\\\\\\/])',string) due to double escaping. 由于双重转义,您的方法的正确命令将是re.split('([\\\\\\/])',string)

The moral of the story: always use raw literals r"..." with regexes to avoid double escaping issues. 故事的寓意:始终使用原始文字r"..."与正则表达式,以避免双重逃避问题。

I think, this solution gives exactly what you want: 我想,这个解决方案可以提供你想要的东西:

import re
testStr = '-------/--------\\---------/------\\'
parts = re.split('(\\\\|/)', testStr)
for p in parts:
    print('p=' + p)

Result: 结果:

p=-------
p=/
p=--------
p=\
p=---------
p=/
p=------
p=\
p=

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM