简体   繁体   中英

Split string every nth character?

Is it possible to split a string every nth character?

For example, suppose I have a string containing the following:

'1234567890'

How can I get it to look like this:

['12','34','56','78','90']

For the same question with a list, see How do I split a list into equally-sized chunks? . The same techniques generally apply, though there are some variations.

>>> line = '1234567890'
>>> n = 2
>>> [line[i:i+n] for i in range(0, len(line), n)]
['12', '34', '56', '78', '90']

Just to be complete, you can do this with a regex:

>>> import re
>>> re.findall('..','1234567890')
['12', '34', '56', '78', '90']

For odd number of chars you can do this:

>>> import re
>>> re.findall('..?', '123456789')
['12', '34', '56', '78', '9']

You can also do the following, to simplify the regex for longer chunks:

>>> import re
>>> re.findall('.{1,2}', '123456789')
['12', '34', '56', '78', '9']

And you can use re.finditer if the string is long to generate chunk by chunk.

There is already an inbuilt function in python for this.

>>> from textwrap import wrap
>>> s = '1234567890'
>>> wrap(s, 2)
['12', '34', '56', '78', '90']

This is what the docstring for wrap says:

>>> help(wrap)
'''
Help on function wrap in module textwrap:

wrap(text, width=70, **kwargs)
    Wrap a single paragraph of text, returning a list of wrapped lines.

    Reformat the single paragraph in 'text' so it fits in lines of no
    more than 'width' columns, and return a list of wrapped lines.  By
    default, tabs in 'text' are expanded with string.expandtabs(), and
    all other whitespace characters (including newline) are converted to
    space.  See TextWrapper class for available keyword args to customize
    wrapping behaviour.
'''

Another common way of grouping elements into n-length groups:

>>> s = '1234567890'
>>> map(''.join, zip(*[iter(s)]*2))
['12', '34', '56', '78', '90']

This method comes straight from the docs for zip() .

I think this is shorter and more readable than the itertools version:

def split_by_n(seq, n):
    '''A generator to divide a sequence into chunks of n units.'''
    while seq:
        yield seq[:n]
        seq = seq[n:]

print(list(split_by_n('1234567890', 2)))

I like this solution:

s = '1234567890'
o = []
while s:
    o.append(s[:2])
    s = s[2:]

Using more-itertools from PyPI:

>>> from more_itertools import sliced
>>> list(sliced('1234567890', 2))
['12', '34', '56', '78', '90']

You could use the grouper() recipe from itertools :

Python 2.x:

from itertools import izip_longest    

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

Python 3.x:

from itertools import zip_longest

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

These functions are memory-efficient and work with any iterables.

I was stucked in the same scenrio.

This worked for me

x="1234567890"
n=2
list=[]
for i in range(0,len(x),n):
    list.append(x[i:i+n])
print(list)

Output

['12', '34', '56', '78', '90']

Try the following code:

from itertools import islice

def split_every(n, iterable):
    i = iter(iterable)
    piece = list(islice(i, n))
    while piece:
        yield piece
        piece = list(islice(i, n))

s = '1234567890'
print list(split_every(2, list(s)))

This can be achieved by a simple for loop.

a = '1234567890a'
result = []

for i in range(0, len(a), 2):
    result.append(a[i : i + 2])
print(result)

The output looks like ['12', '34', '56', '78', '90', 'a']

>>> from functools import reduce
>>> from operator import add
>>> from itertools import izip
>>> x = iter('1234567890')
>>> [reduce(add, tup) for tup in izip(x, x)]
['12', '34', '56', '78', '90']
>>> x = iter('1234567890')
>>> [reduce(add, tup) for tup in izip(x, x, x)]
['123', '456', '789']

As always, for those who love one liners

n = 2  
line = "this is a line split into n characters"  
line = [line[i * n:i * n+n] for i,blah in enumerate(line[::n])]

Try this:

s='1234567890'
print([s[idx:idx+2] for idx,val in enumerate(s) if idx%2 == 0])

Output:

['12', '34', '56', '78', '90']

A simple recursive solution for short string:

def split(s, n):
    if len(s) < n:
        return []
    else:
        return [s[:n]] + split(s[n:], n)

print(split('1234567890', 2))

Or in such a form:

def split(s, n):
    if len(s) < n:
        return []
    elif len(s) == n:
        return [s]
    else:
        return split(s[:n], n) + split(s[n:], n)

, which illustrates the typical divide and conquer pattern in recursive approach more explicitly (though practically it is not necessary to do it this way)

more_itertools.sliced has been mentioned before. Here are four more options from the more_itertools library:

s = "1234567890"

["".join(c) for c in mit.grouper(2, s)]

["".join(c) for c in mit.chunked(s, 2)]

["".join(c) for c in mit.windowed(s, 2, step=2)]

["".join(c) for c in  mit.split_after(s, lambda x: int(x) % 2 == 0)]

Each of the latter options produce the following output:

['12', '34', '56', '78', '90']

Documentation for discussed options: grouper , chunked , windowed , split_after

A solution with groupby :

from itertools import groupby, chain, repeat, cycle

text = "wwworldggggreattecchemggpwwwzaz"
n = 3
c = cycle(chain(repeat(0, n), repeat(1, n)))
res = ["".join(g) for _, g in groupby(text, lambda x: next(c))]
print(res)

Output:

['www', 'orl', 'dgg', 'ggr', 'eat', 'tec', 'che', 'mgg', 'pww', 'wza', 'z']

These answers are all nice and working and all, but the syntax is so cryptic... Why not write a simple function?

def SplitEvery(string, length):
    if len(string) <= length: return [string]        
    sections = len(string) / length
    lines = []
    start = 0;
    for i in range(sections):
        line = string[start:start+length]
        lines.append(line)
        start += length
    return lines

And call it simply:

text = '1234567890'
lines = SplitEvery(text, 2)
print(lines)

# output: ['12', '34', '56', '78', '90']

Another solution using groupby and index//n as the key to group the letters:

from itertools import groupby

text = "abcdefghij"
n = 3

result = []
for idx, chunk in groupby(text, key=lambda x: x.index//n):
    result.append("".join(chunk))

# result = ['abc', 'def', 'ghi', 'j']

Spooky one – tried to invent yet another answer:

def split(s, chunk_size):
    a = zip(*[s[i::chunk_size] for i in range(chunk_size)])
    return [''.join(t) for t in a]

print(split('1234567890', 1))
print(split('1234567890', 2))
print(split('1234567890', 3))

Out

['1', '2', '3', '4', '5', '6', '7', '8', '9', '0']
['12', '34', '56', '78', '90']
['123', '456', '789']
def split(s, n):
  """
  Split string every nth character

  Parameters
  ----------
  s: string
  n: value of nth
  """
  new_list = []
  for i in range(0, len(s), n):
    new_list.append(s[i:i+n])
  return new_list

print(split('1234567890', 2))

I know this question is old, but this is the shortest way to do it I'm aware of:

def split_every_n(S, n):
  return [S[i*n:(i+1)*n] for i in range(len(S) / n)]

This, however, assumes that the length of your string is a multiple of n. Otherwise, you'd have to pad it.

One possibility is to use regular expressions:

import re
re.findall("\w{3}", your_string)

This might be a little more clear

##Define your string
mystring = '1234567890'

##Define your starting index
start = 0
##Define the end of your index for the first slice
end = 2

##Create an empty list
mylist =[]

##While the slice of characters without white space has something in it keep going
while len(mystring[start:end])>0:
    ##Add to the list
    mylist.append(mystring[start:end])
    ##Move the index up for the begining and ending of the slice
    start+=2
    end+=2
def splitstr(oldstr,n):
    start = 0
    end = n
    newlist =[]
    while len(oldstr[start:end])>0:
        newlist.append(oldstr[start:end])
        start+=n
        end+=n
    return newlist
print(splitstr('1234567890', 2))

I've got this code that I use whenever I need to do this:

def split_string(n, st):
    lst = [""]
    for i in str(st):
        l = len(lst) - 1
        if len(lst[l]) < n: 
            lst[l] += i
        else:
            lst += [i]
    return lst

print(split_string(3, "test_string."))

Where:

  • n is the length of each list item
  • st is the string to be split up
  • lst is the list version of st
  • i is the current character being used in st
  • l is the length of the last list item

Here is another solution for a more general case where the chunks are not of equal length. If the length is 0, all the remaining part is returned.

data is the sequence to be split; fieldsize is a tuple with the list of the field length.

def fieldsplit(data=None, fieldsize=()):
    tmpl=[];
    for pp in fieldsize:
        if(pp>0):
            tmpl.append(line[:pp]);
            line=line[pp:];
        else:
            tmpl.append(line);
            break;
    return tuple(tmpl);

I am using this:

list(''.join(s) for s in zip(my_str[::2], my_str[1::2]))

or you can use any other n number instead of 2 .

This question reminds me of the Perl 6 .comb(n) method. It breaks up strings into n -sized chunks. (There's more to it than that, but I'll leave out the details.)

It's easy enough to implement a similar function in Python3 as a lambda expression:

comb = lambda s,n: [s[i:i+n] for i in range(0,len(s),n)]

Then you can call it like this:

comb('1234567', 2)   # returns ['12', '34', '56', '7']

This comb() function will also operate on lists (to produce a list of lists):

comb(['cat', 'dog', 'bird'], 2)  # returns [['cat', 'dog'], ['bird']]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM