[英]Splitting a String using multiple delimiters
I want to split a string using python. 我想使用python分割字符串。 I have successfully done it for one variable, but finding it hard to do it for 2.
我已经成功完成了一个变量,但是很难做到2。
The String : 字符串:
Paragraph 4-2 says. i am going home$ early- Yes.
I need the output to be 我需要输出
Paragraph 4-2 says
i am going home
early
Yes
The sentence should split from .
句子应与分开
.
, $
and -
(But when it's between 2 numbers (4-2) it shouldn't split) ,
$
和-
(但当它在2个数字(4-2)之间时,不应拆分)
How can i do this? 我怎样才能做到这一点?
text.split('.')
UPDATE 更新
The new output should be like : 新的输出应类似于:
Paragraph 4-2 says.
i am going home$
early-
Yes.
>>> import re
>>> s = 'Paragraph 4-2 says. i am going home$ early- Yes'
>>>
>>> re.split(r'(?<!\d)\s*[.$-]\s*(?!\d)', s)
['Paragraph 4-2 says', 'i am going home', 'early', 'Yes']
\\s*[.$-]\\s*
matches any of .
\\s*[.$-]\\s*
与匹配.
, $
or -
surrounded by 0 or more spaces ( \\s*
). $
或-
由0或多个空格( \\s*
)包围。 (?<!\\d)
is a negative-lookbehind to ensure that the match is not preceded by a digit. (?<!\\d)
是一个负号,以确保不以数字开头。 (?!\\d)
is a negative-lookahead to ensure that the match is not followed by a digit. (?!\\d)
是负向超前字符,以确保匹配项后没有数字。 You can read more about lookarounds here . 您可以在此处阅读更多有关环顾四周的信息 。
>>> re.split('(?<=\D)[.$-](?=\D|$)', s)
['Paragraph 4-2 says', ' i am going home', ' early', ' Yes']
>>>
(?<\\D)[.$-](?=\\D)
will get the .$-, not followed or proceded by intergers. (?<\\D)[.$-](?=\\D)
将获得。$-,而不是跟在整数后面。 And the lookahead and lookbehind won't consume any string. 而且,向前和向后搜索不会消耗任何字符串。 So the string will be splitted only the .$-, without the numbers surrounded by it.
因此,该字符串将仅分割。$-,而不会将数字括起来。
Edit: 编辑:
>>> re.findall('.*?(?<=\D)[.$-](?=[\D]|$)', s)
['Paragraph 4-2 says.', ' i am going home$', ' early-', ' Yes.']
You can do this: 你可以这样做:
>>> import re
>>> st='Paragraph 4-2 says. i am going home$ early- Yes.'
>>> [m.group(1) for m in re.finditer(r'(.*?[.$\-])(?:\s+|$)',st)]
['Paragraph 4-2 says.', 'i am going home$', 'early-', 'Yes.']
If you are not going to modify the match group at all (with strip or something) you can also just use findall with the same regex: 如果您根本不打算修改匹配组(使用条或其他东西),则也可以只使用具有相同正则表达式的findall:
>>> re.findall(r'(.*?[.$\-])(?:\s+|$)',st)
['Paragraph 4-2 says.', 'i am going home$', 'early-', 'Yes.']
The regex is explained here , but in summary: 正则表达式在此处进行了说明,但总而言之:
(.*?[.$\-]) is the capture group containing:
.*? Any character (except newline) 0 to infinite times [lazy]
[.$\-] Character class matching .$- one time
(?:\s+|$) Non-capturing Group containing:
\s+ First alternate: Whitespace [\t \r\n\f] 1 to infinite times [greedy]
| or
$ Second alternate: end of string
Depending on your strings, you may need to change the regex to (.*?[.$\\-])(?:[ ]+|$)
if you don't want to match \\r\\n\\f
with the \\s
如果您不想将
\\r\\n\\f
与\\s
匹配,则取决于您的字符串,可能需要将正则表达式更改为(.*?[.$\\-])(?:[ ]+|$)
\\s
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.