简体   繁体   English

python re.split 与 maxsplit 参数

[英]python re.split with maxsplit argument

when using re.split I'd expect the maxsplit to be the length of the returned list (-1).使用re.split我希望maxsplit是返回列表的长度 (-1)。

The examples in the docs suggest so.文档中的示例表明了这一点。

But when there is a capture group (and maybe some other cases) then I don't understand how the maxsplit argument works.但是当有一个捕获组(可能还有其他一些情况)时,我就不明白maxsplit参数是如何工作的。

>>> re.split("(\W+)", "Words, words, words.", maxsplit=1)
['Words', ', ', 'words, words.']

>>> re.split("(:)", ":a:b::c", maxsplit=2)
['', ':', 'a', ':', 'b::c']
>>> re.split("((:))", ":a:b::c", maxsplit=2)
['', ':', ':', 'a', ':', ':', 'b::c']

What am I missing?我错过了什么?

It's not about maxsplit , it's about you using parentheses in the regular expression:这不是关于maxsplit ,而是关于您在正则表达式中使用括号:

If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.如果在模式中使用捕获括号,则模式中所有组的文本也作为结果列表的一部分返回。

DOCS: https://docs.python.org/3/library/re.html#re.split文档: https : //docs.python.org/3/library/re.html#re.split

So what I'm guessing is that maxsplit determines the number of splits, and the parentheses return additional groups.所以我猜是maxsplit决定了分割的数量,括号返回额外的组。

Example例子
":a:b::c" with maxsplit=2 splits your string in three parts: ":a:b::c"maxsplit=2将你的字符串分成三部分:
"", "a", "b::c" "", "a", "b::c"

But because the pattern "(:)" also contains a captured group, it's returned in between the parts: "", ":", "a", ":", "b::c"但是因为模式"(:)"也包含一个捕获的组,它在以下部分之间返回:"", ":", "a", ":", "b::c"

If the pattern is "((:))" , then each colon is returned twice in between the parts如果模式是"((:))" ,则每个冒号在部分之间返回两次

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM