简体   繁体   中英

Why does split() return more elements than split(" ") on same string?

I am using split() and split(" ") on the same string . But why is split(" ") returning less number of elements than split() ? I want to know in what specific input case this would happen.

str.split with the None argument (or, no argument) splits on all whitespace characters, and this isn't limited to just the space you type in using your spacebar.

In [457]: text = 'this\nshould\rhelp\tyou\funderstand'

In [458]: text.split()
Out[458]: ['this', 'should', 'help', 'you', 'understand']

In [459]: text.split(' ')
Out[459]: ['this\nshould\rhelp\tyou\x0cunderstand']

List of all whitespace characters that split(None) splits on can be found at All the Whitespace Characters? Is it language independent?

If you run the help command on the split() function you'll see this:

split(...) S.split([sep [,maxsplit]]) -> list of strings

Return a list of the words in the string S, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done. If sep is not specified or is None, any whitespace string is a separator and empty strings are removed from the result.

Therefore the difference between the to is that split() without specifing the delimiter will delete the empty strings while the one with the delimiter won't.

The method str.split called without arguments has a somewhat different behaviour.

First it splits by any whitespace character.

'foo bar\nbaz\tmeh'.split() # ['foo', 'bar', 'baz', 'meh']

But it also remove the empty strings from the output list.

' foo bar '.split(' ') # ['', 'foo', 'bar', '']

' foo bar '.split() # ['foo', 'bar']

In Python, the split function splits on a specific string if specified, otherwise on spaces (and then you can access the result list by index as usual):

s = "Hello world! How are you?"
s.split() 
Out[9]:['Hello', 'world!', 'How', 'are', 'you?']
s.split("!")
Out[10]: ['Hello world', ' How are you?'] 
s.split("!")[0] 
Out[11]: 'Hello world' 

From my own experience, the most confusion had come from split() 's different treatments on whitespace.

Having a separator like ' ' vs None , triggers different behavior of split() . According to the Python documentation .

If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace.

Below is an example, in which the sample string has a trailing space ' ' , which is the same whitespace as the one passed in the second split() . Hence, this method behaves differently, not because of some whitespace character mismatch, but it's more of how this method was designed to work, maybe for convenience in common scenarios, but it can also be confusing for people who expect the split() to just split .

sample = "a b "
sample.split()
>>> ['a', 'b']
sample.split(' ')
>>> ['a', 'b', '']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM