[英]python re.split with maxsplit argument
when using re.split
I'd expect the maxsplit
to be the length of the returned list (-1).使用
re.split
我希望maxsplit
是返回列表的长度 (-1)。
The examples in the docs suggest so.文档中的示例表明了这一点。
But when there is a capture group (and maybe some other cases) then I don't understand how the maxsplit
argument works.但是当有一个捕获组(可能还有其他一些情况)时,我就不明白
maxsplit
参数是如何工作的。
>>> re.split("(\W+)", "Words, words, words.", maxsplit=1)
['Words', ', ', 'words, words.']
>>> re.split("(:)", ":a:b::c", maxsplit=2)
['', ':', 'a', ':', 'b::c']
>>> re.split("((:))", ":a:b::c", maxsplit=2)
['', ':', ':', 'a', ':', ':', 'b::c']
What am I missing?我错过了什么?
It's not about maxsplit
, it's about you using parentheses in the regular expression:这不是关于
maxsplit
,而是关于您在正则表达式中使用括号:
If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.
如果在模式中使用捕获括号,则模式中所有组的文本也作为结果列表的一部分返回。
DOCS: https://docs.python.org/3/library/re.html#re.split文档: https : //docs.python.org/3/library/re.html#re.split
So what I'm guessing is that maxsplit
determines the number of splits, and the parentheses return additional groups.所以我猜是
maxsplit
决定了分割的数量,括号返回额外的组。
Example例子
":a:b::c"
with maxsplit=2
splits your string in three parts: ":a:b::c"
和maxsplit=2
将你的字符串分成三部分:
"", "a", "b::c" "", "a", "b::c"
But because the pattern "(:)"
also contains a captured group, it's returned in between the parts: "", ":", "a", ":", "b::c"但是因为模式
"(:)"
也包含一个捕获的组,它在以下部分之间返回:"", ":", "a", ":", "b::c"
If the pattern is "((:))"
, then each colon is returned twice in between the parts如果模式是
"((:))"
,则每个冒号在部分之间返回两次
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.