简体   繁体   中英

Split string at nth occurrence of a given character

Is there a Python-way to split a string after the nth occurrence of a given delimiter?

Given a string:

'20_231_myString_234'

It should be split into (with the delimiter being '_', after its second occurrence):

['20_231', 'myString_234']

Or is the only way to accomplish this to count, split and join?

>>> n = 2
>>> groups = text.split('_')
>>> '_'.join(groups[:n]), '_'.join(groups[n:])
('20_231', 'myString_234')

Seems like this is the most readable way, the alternative is regex)

Using re to get a regex of the form ^((?:[^_]*_){n-1}[^_]*)_(.*) where n is a variable:

n=2
s='20_231_myString_234'
m=re.match(r'^((?:[^_]*_){%d}[^_]*)_(.*)' % (n-1), s)
if m: print m.groups()

or have a nice function:

import re
def nthofchar(s, c, n):
    regex=r'^((?:[^%c]*%c){%d}[^%c]*)%c(.*)' % (c,c,n-1,c,c)
    l = ()
    m = re.match(regex, s)
    if m: l = m.groups()
    return l

s='20_231_myString_234'
print nthofchar(s, '_', 2)

Or without regexes, using iterative find:

def nth_split(s, delim, n): 
    p, c = -1, 0
    while c < n:  
        p = s.index(delim, p + 1)
        c += 1
    return s[:p], s[p + 1:] 

s1, s2 = nth_split('20_231_myString_234', '_', 2)
print s1, ":", s2

I like this solution because it works without any actuall regex and can easiely be adapted to another "nth" or delimiter.

import re

string = "20_231_myString_234"
occur = 2  # on which occourence you want to split

indices = [x.start() for x in re.finditer("_", string)]
part1 = string[0:indices[occur-1]]
part2 = string[indices[occur-1]+1:]

print (part1, ' ', part2)

I thought I would contribute my two cents. The second parameter to split() allows you to limit the split after a certain number of strings:

def split_at(s, delim, n):
    r = s.split(delim, n)[n]
    return s[:-len(r)-len(delim)], r

On my machine, the two good answers by @perreal, iterative find and regular expressions, actually measure 1.4 and 1.6 times slower (respectively) than this method.

It's worth noting that it can become even quicker if you don't need the initial bit. Then the code becomes:

def remove_head_parts(s, delim, n):
    return s.split(delim, n)[n]

Not so sure about the naming, I admit, but it does the job. Somewhat surprisingly, it is 2 times faster than iterative find and 3 times faster than regular expressions.

I put up my testing script online . You are welcome to review and comment.

It depends what is your pattern for this split. Because if first two elements are always numbers for example, you may build regular expression and use re module. It is able to split your string as well.

>>>import re
>>>str= '20_231_myString_234'

>>> occerence = [m.start() for m in re.finditer('_',str)]  # this will give you a list of '_' position
>>>occerence
[2, 6, 15]
>>>result = [str[:occerence[1]],str[occerence[1]+1:]] # [str[:6],str[7:]]
>>>result
['20_231', 'myString_234']

I had a larger string to split ever nth character, ended up with the following code:

# Split every 6 spaces
n = 6
sep = ' '
n_split_groups = []

groups = err_str.split(sep)
while len(groups):
    n_split_groups.append(sep.join(groups[:n]))
    groups = groups[n:]

print n_split_groups

Thanks @perreal!

In function form of @AllBlackt's solution

def split_nth(s, sep, n):
    n_split_groups = []
    groups = s.split(sep)
    while len(groups):
          n_split_groups.append(sep.join(groups[:n]))
          groups = groups[n:]
    return n_split_groups

s = "aaaaa bbbbb ccccc ddddd eeeeeee ffffffff"
print (split_nth(s, " ", 2))

['aaaaa bbbbb', 'ccccc ddddd', 'eeeeeee ffffffff']

As @Yuval has noted in his answer, and @jamylak commented in his answer, the split and rsplit methods accept a second (optional) parameter maxsplit to avoid making splits beyond what is necessary. Thus, I find the better solution (both for readability and performance) is this:

s = '20_231_myString_234'
first_part = text.rsplit('_', 2)[0] # Gives '20_231'
second_part = text.split('_', 2)[2] # Gives 'myString_234'

This is not only simple, but also avoids performance hits of regex solutions and other solutions using join to undo unnecessary splits.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM