使用多个定界符分割字符串

Question

I want to split a string using python. 我想使用python分割字符串。 I have successfully done it for one variable, but finding it hard to do it for 2. 我已经成功完成了一个变量，但是很难做到2。

The String : 字符串：

Paragraph 4-2 says. i am going home$ early- Yes.

I need the output to be 我需要输出

Paragraph 4-2 says
i am going home 
early
Yes

The sentence should split from . 句子应与分开. , $ and - (But when it's between 2 numbers (4-2) it shouldn't split) ， $和- （但当它在2个数字（4-2）之间时，不应拆分）

How can i do this? 我怎样才能做到这一点？

text.split('.')

UPDATE 更新

The new output should be like : 新的输出应类似于：

Paragraph 4-2 says.
i am going home$ 
early-
Yes.

Answer 1

>>> import re
>>> s = 'Paragraph 4-2 says. i am going home$ early- Yes'
>>>
>>> re.split(r'(?<!\d)\s*[.$-]\s*(?!\d)', s)
['Paragraph 4-2 says', 'i am going home', 'early', 'Yes']

\\s*[.$-]\\s* matches any of . \\s*[.$-]\\s*与匹配. , $ or - surrounded by 0 or more spaces ( \\s* ). ， $或-由0或多个空格（ \\s* ）包围。
(?<!\\d) is a negative-lookbehind to ensure that the match is not preceded by a digit. (?<!\\d)是一个负号，以确保不以数字开头。
(?!\\d) is a negative-lookahead to ensure that the match is not followed by a digit. (?!\\d)是负向超前字符，以确保匹配项后没有数字。

You can read more about lookarounds here . 您可以在此处阅读更多有关环顾四周的信息。

Answer 2

>>> re.split('(?<=\D)[.$-](?=\D|$)', s)
['Paragraph 4-2 says', ' i am going home', ' early', ' Yes']
>>>

(?<\\D)[.$-](?=\\D) will get the .$-, not followed or proceded by intergers. (?<\\D)[.$-](?=\\D)将获得。$-，而不是跟在整数后面。 And the lookahead and lookbehind won't consume any string. 而且，向前和向后搜索不会消耗任何字符串。 So the string will be splitted only the .$-, without the numbers surrounded by it. 因此，该字符串将仅分割。$-，而不会将数字括起来。

Edit: 编辑：

>>> re.findall('.*?(?<=\D)[.$-](?=[\D]|$)', s)
['Paragraph 4-2 says.', ' i am going home$', ' early-', ' Yes.']

Answer 3

You can do this: 你可以这样做：

>>> import re
>>> st='Paragraph 4-2 says. i am going home$ early- Yes.'
>>> [m.group(1) for m in re.finditer(r'(.*?[.$\-])(?:\s+|$)',st)]
['Paragraph 4-2 says.', 'i am going home$', 'early-', 'Yes.']

If you are not going to modify the match group at all (with strip or something) you can also just use findall with the same regex: 如果您根本不打算修改匹配组（使用条或其他东西），则也可以只使用具有相同正则表达式的findall：

>>> re.findall(r'(.*?[.$\-])(?:\s+|$)',st)
['Paragraph 4-2 says.', 'i am going home$', 'early-', 'Yes.']

The regex is explained here , but in summary: 正则表达式在此处进行了说明，但总而言之：

(.*?[.$\-])  is the capture group containing:
 .*?          Any character (except newline) 0 to infinite times [lazy] 
    [.$\-]   Character class matching .$- one time

(?:\s+|$)    Non-capturing Group containing:
   \s+        First alternate: Whitespace [\t \r\n\f] 1 to infinite times [greedy] 
      |        or
       $      Second alternate: end of string

Depending on your strings, you may need to change the regex to (.*?[.$\\-])(?:[ ]+|$) if you don't want to match \\r\\n\\f with the \\s 如果您不想将\\r\\n\\f与\\s匹配，则取决于您的字符串，可能需要将正则表达式更改为(.*?[.$\\-])(?:[ ]+|$) \\s

使用多个定界符分割字符串

问题描述

3 个解决方案

解决方案1
5 2013-07-27 15:54:24

解决方案2
4 2013-07-27 15:54:49

解决方案3
1 已采纳 2013-07-27 16:38:47

使用多个定界符分割字符串

问题描述

3 个解决方案

解决方案1 5 2013-07-27 15:54:24

解决方案2 4 2013-07-27 15:54:49

解决方案3 1 已采纳 2013-07-27 16:38:47

解决方案1
5 2013-07-27 15:54:24

解决方案2
4 2013-07-27 15:54:49

解决方案3
1 已采纳 2013-07-27 16:38:47