简体   繁体   中英

Better way to get multiple tokens from a string? (Python 2)

If I have a string:

"The quick brown fox jumps over the lazy dog!"

I will often use the split() function to tokenize the string.

testString = "The quick brown fox jumps over the lazy dog!"
testTokens = testString.split(" ")

This will give me a list:

['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog!']

If I want to remove the first token and keep the REST of the tokens intact, I will do something like this to make it a one-liner:

newString = " ".join(testTokens.split(' ')[1:]) # "quick brown fox jumps over the lazy dog!"

Or, if I want a certain range:

newString = " ".join(testTokens.split(' ')[2:4]) # "brown fox"
newString = " ".join(testTokens.split(' ')[:3]) # "The quick brown"

Of course, I may want to split on something other than a space:

testString = "So.long.and.thanks.for.all.the.fish!"
testTokens = testString.split('.')

newString = ".".join(testTokens.split('.')[3:]) # "thanks.for.all.the.fish!"

Is this the BEST way to accomplish this? Or is there a more efficient or more readable way?

Note that split can take an optional second argument, signifying the maximum number of splits that should be made:

>>> testString.split(' ', 1)[1]
'quick brown fox jumps over the lazy dog!'

This is much better than " ".join(testTokens.split(' ')[1:]) , whenever it can be applied.

Thank you @abarnert for pointing out that .split(' ', 1)[1] raises an exception if there are no spaces. See partition if that poses an issue.


Furthermore, there is also an rsplit method, so you can use:

>>> testString.rsplit(' ', 6)[0]
'The quick brown'

instead of " ".join(testTokens.split(' ')[:3]) .

Your current method is perfectly fine. You can gain a very slight performance boost by limiting the number of splits. For example:

>>> ' '.join(testString.split(' ', 4)[2:4])
'brown fox'
>>> ' '.join(testString.split(' ', 3)[:3])
'The quick brown'
>>> ' '.join(testString.split(' ', 1)[1:])
'quick brown fox jumps over the lazy dog!'

Note that for smaller strings the difference will be negligible, so you should probably stick with your simpler code. Here is an example of the minimal timing difference:

In [2]: %timeit ' '.join(testString.split(' ', 4)[2:4])
1000000 loops, best of 3: 752 ns per loop

In [3]: %timeit ' '.join(testString.split(' ')[2:4])
1000000 loops, best of 3: 886 ns per loop

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM