简体   繁体   中英

Using split function in python3.5

Trying to split the string at number 7 and I want 7 to be included in the second part of the split string.

Code:

a = 'cats can jump up to 7 times their tail length'

words = a.split("7")

print(words)

Output:

['cats can jump up to ', ' times their tail length']

String got split but second part doesn't include 7.

I want to know how I can include 7.

note: not a duplicate of Python split() without removing the delimiter because the separator has to be part of the second string.

A simple and naive way to do this is just to find the index of what you want to split on and slice it:

>>> a = 'cats can jump up to 7 times their tail length'
>>> ind = a.index('7')
>>> a[:ind], a[ind:]
('cats can jump up to ', '7 times their tail length')

Another way is to use str.partition :

a = 'cats can jump up to 7 times their tail length'
print(a.partition('7'))
# ('cats can jump up to ', '7', ' times their tail length')

To join the number again with the latter part you can use str.join :

x, *y = a.partition('7')
y = ''.join(y)
print((x, y))
# ('cats can jump up to ', '7 times their tail length')

Or do it manually:

sep = '7'
x = a.split(sep)
x[1] = sep + x[1]
print(tuple(x))
# ('cats can jump up to ', '7 times their tail length')

in one line, using re.split with the rest of the string, and filter the last, empty string that re.split leaves:

import re
a = 'cats can jump up to 7 times their tail length'
print([x for x in re.split("(7.*)",a) if x])

result:

['cats can jump up to ', '7 times their tail length']

using () in split regex tells re.split not to discard the separator. A (7) regex would have worked but would have created a 3-item list like str.partition does, and would have required some post processing, so no one-liner.

now if the number isn't known, regex is (again) the best way to do it. Just change the code to:

[x for x in re.split("(\d.*)",a) if x]

re can be used to capture globally as well:

>>> s = 'The 7 quick brown foxes jumped 7 times over 7 lazy dogs'
>>> sep = '7'
>>> 
>>> [i for i in re.split(f'({sep}[^{sep}]*)', s) if i]
['The ', '7 quick brown foxes jumped ', '7 times over ', '7 lazy dogs']

If the f-string is hard to read, note that it just evaluates to (7[^7]*) .
(To the same end as the listcomp one can use list(filter(bool, ...)) , but it's comparatively quite ugly)


In Python 3.7 and onward, re.split() allows splitting on zero-width patterns. This means a lookahead regex, namely f'(?={sep})' , can be used instead of the group shown above.

What's strange about this is the timings: if using re.split() (ie without a compiled pattern object), the group solution consistently runs about 1.5x faster than the lookahead. However, when compiled, the lookahead beats the other hands-down:

In [4]: r_lookahead = re.compile('f(?={sep})')

In [5]: r_group = re.compile(f'({sep}[^{sep}]*)')

In [6]: %timeit [i for i in r_lookahead.split(s) if i]
2.76 µs ± 207 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [7]: %timeit [i for i in r_group.split(s) if i]
5.74 µs ± 65.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [8]: %timeit [i for i in r_lookahead.split(s * 512) if i]
137 µs ± 1.93 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [9]: %timeit [i for i in r_group.split(s * 512) if i]
1.88 ms ± 18.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

A recursive solution also works fine, although more slowly than splitting on a compiled regex (but faster than a straight re.split(...) ):

def splitkeep(s, sep, prefix=''):
    start, delim, end = s.partition(sep)
    return [prefix + start, *(end and splitkeep(end, sep, delim))]
>>> s = 'The 7 quick brown foxes jumped 7 times over 7 lazy dogs'
>>> 
>>> splitkeep(s, '7')
['The ', '7 quick brown foxes jumped ', '7 times over ', '7 lazy dogs']

Using enumerate, This only works if the string doesnt start with the seperator

s = 'The quick 7 the brown foxes jumped 7 times over 7 lazy dogs'

separator = '7'
splitted = s.split(separator)

res = [((separator if i > 0 else '') + item).strip() for i, item in enumerate(splitted)]

print(res)
['The quick', '7 the brown foxes jumped', '7 times over', '7 lazy dogs']

[Program finished]

There's also the possibility to do all of it using split and list comprehension, without the need to import any library. This will, however, make your code slightly "less pretty":

a = 'cats can jump up to 7 times their tail length'
sep = '7'
splitString = a.split(sep)
splitString = list(splitString[0]) + [sep+x for x in splitString[1:]]

And with that, splitString will carry the value:

['cats can jump up to ', '7 times their tail length']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM