使用正则表达式使用反斜杠拆分字符串

Question

For a python program I have an input that I am taking from stdin and the input is something like: 对于python程序，我有一个输入，我从stdin ，输入是这样的：

"-------/--------\---------/------\"

When I print it out as a string value it is printed as it is. 当我将其打印为字符串值时，它将按原样打印。 I am trying to split the string into a list of strings based of forward and backward slash while keeping the separators as well. 我试图将字符串拆分为基于正向和反向斜线的字符串列表，同时保持分隔符。 I have used something like this: 我用过这样的东西：

re.split('(\\/)',string)

but the result that I get is: 但我得到的结果是：

['------' , '/' , '--------\\\\---------' , '/' , '---------\\\']

I was rather expecting it to be something like: 我更期待它是这样的：

['------' , '/' , '---------' , '\' , '---------', '/' , '---------' , '\']

What am I doing wrong here and how to solve this problem? 我在这里做错了什么以及如何解决这个问题？

Answer 1

To capture a delimiter, it's easier to use findall instead of split : 要捕获分隔符，使用findall而不是split更容易：

re.findall(r'[^\\/]+|[\\/]', string)

[^\\\\/]+ would find 1 or more occurrences of sub-strings that do not contain forward or backward slash. [^\\\\/]+会发现一个或多个不包含正斜杠或反斜杠的子串。 | works as an or operator. 作为一个或运营商。 Finally, [\\\\/] will match with the occurrences of forward and backward slash. 最后， [\\\\/]将匹配前向和后向斜杠的出现。 The result would provide separate sub-strings for the occurrences of forward and backward slash and string matches where they do not occur. 结果将为前向和后向斜杠的出现提供单独的子字符串，并且不会出现字符串匹配。

As for why your code didn't work, your expression is (\\\\/) . 至于为什么你的代码不起作用，你的表达式是(\\\\/) 。 When Python interpreter parses this, it sees an escaped slash and creates a string of four characters: ( \\ / ) . 当Python解释器解析它时，它会看到一个转义的斜杠并创建一个包含四个字符的字符串： ( \\ / ) 。 Then, this string is sent to the regex engine, which also does escaping. 然后，此字符串将发送到正则表达式引擎，该引擎也会进行转义。 It sees a slash followed by a backslash, and since backslash is not special, it "escapes" to itself, so the final expression is just (/) . 它看到一个斜杠后跟一个反斜杠，并且由于反斜杠并不特殊，它会“逃逸”到自身，因此最后的表达式只是(/) 。 Finally, re applies this expression, splits by a backslash and captures it - exactly what you're observing. 最后，重新应用这个表达式，用反斜杠分割并捕获它 - 正是你正在观察的内容。

The correct command for your approach would be re.split('([\\\\\\/])',string) due to double escaping. 由于双重转义，您的方法的正确命令将是re.split('([\\\\\\/])',string) 。

The moral of the story: always use raw literals r"..." with regexes to avoid double escaping issues. 故事的寓意：始终使用原始文字r"..."与正则表达式，以避免双重逃避问题。

Answer 2

I think, this solution gives exactly what you want: 我想，这个解决方案可以提供你想要的东西：

import re
testStr = '-------/--------\\---------/------\\'
parts = re.split('(\\\\|/)', testStr)
for p in parts:
    print('p=' + p)

Result: 结果：

p=-------
p=/
p=--------
p=\
p=---------
p=/
p=------
p=\
p=

使用正则表达式使用反斜杠拆分字符串

问题描述

2 个解决方案

解决方案1
3 已采纳 2014-04-29 09:23:35

解决方案2
0 2017-08-30 10:17:09

使用正则表达式使用反斜杠拆分字符串

问题描述

2 个解决方案

解决方案1 3 已采纳 2014-04-29 09:23:35

解决方案2 0 2017-08-30 10:17:09

解决方案1
3 已采纳 2014-04-29 09:23:35

解决方案2
0 2017-08-30 10:17:09