简体   繁体   English

是python str.split()不一致吗?

[英]is python str.split() inconsistent?

>>> ".a string".split('.')
['', 'a string']

>>> "a .string".split('.')
['a ', 'string']

>>> "a string.".split('.')
['a string', '']

>>> "a ... string".split('.')
['a ', '', '', ' string']

>>> "a ..string".split('.')
['a ', '', 'string']

>>> 'this  is a test'.split(' ')
['this', '', 'is', 'a', 'test']

>>> 'this  is a test'.split()
['this', 'is', 'a', 'test']

Why is split() different from split(' ') when the invoked string only have spaces as whitespaces? 当被调用的字符串只有空格作为空格时,为什么split()split(' ') split()不同?

Why split('.') splits "..." to ['',''] ? 为什么split('.')"..."拆分为['',''] split() does not consider an empty word between 2 separators... split()不考虑2个分隔符之间的空字......

The docs are clear about this (see @agf below), but I'd like to know why is this the chosen behaviour. 文档对此很清楚(参见下面的@agf),但我想知道为什么这是选择的行为。

I have looked in the source code ( here ) and thought line 136 should be just less than: ... i < str_len ... 我查看了源代码( 这里 ),思路136应该小于:... i < str_len ...

See the str.split docs , this behavior is specifically mentioned: 请参阅str.split文档 ,具体提到了此行为:

If sep is given, consecutive delimiters are not grouped together and are deemed to delimit empty strings (for example, '1,,2'.split(',') returns ['1', '', '2'] ). 如果给出了sep ,则连续的分隔符不会组合在一起并被视为分隔空字符串(例如, '1,,2'.split(',')返回['1', '', '2'] )。 The sep argument may consist of multiple characters (for example, '1<>2<>3'.split('<>') returns ['1', '2', '3'] ). sep参数可以包含多个字符(例如, '1<>2<>3'.split('<>')返回['1', '2', '3'] )。 Splitting an empty string with a specified separator returns [''] . 使用指定的分隔符拆分空字符串将返回['']

If sep is not specified or is None , a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace . 如果未指定sep或为None则应用不同的拆分算法:连续空格的运行被视为单个分隔符,如果字符串具有前导或尾随空格,则结果将在开头或结尾处不包含空字符串 Consequently, splitting an empty string or a string consisting of just whitespace with a None separator returns [] . 因此,将空字符串或仅由空格组成的字符串拆分为None分隔符将返回[]

Python tries to do what you would expect. Python试图做你期望的事情。 Most people not thinking too hard would probably expect 大多数人都不会想太多

'1 2 3 4 '.split() 

to return 返回

['1', '2', '3', '4']

Think about splitting data where spaces have been used instead of tabs to create fixed-width columns -- if the data is different widths, there will be different number of spaces in each row. 考虑分割使用空格而不是制表符来创建固定宽度列的数据 - 如果数据宽度不同,则每行中将有不同数量的空格。

There is often trailing whitespace at the end of a line that you can't see, and the default ignores it as well -- it gives you the answer you'd visually expect. 在一行的末尾经常有一些你看不到的尾随空格,默认也忽略它 - 它给你你在视觉上期望的答案。

When it comes to the algorithm used when a delimiter is specified, think about a row in a CSV file: 对于指定分隔符时使用的算法,请考虑CSV文件中的行:

1,,3

means there is data in the 1st and 3rd columns, and none in the second, so you would want 表示第1列和第3列中有数据,第2列中没有数据,因此您需要

'1,,3'.split(',')

to return 返回

['1', '', '3']

otherwise you wouldn't be able to tell what column each string came from. 否则你将无法分辨每个字符串来自哪个列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM