简体   繁体   English

每第 n 个字符拆分字符串?

[英]Split string every nth character?

Is it possible to split a string every nth character?是否可以每隔第 n 个字符拆分一个字符串?

For example, suppose I have a string containing the following:例如,假设我有一个包含以下内容的字符串:

'1234567890'

How can I get it to look like this:我怎样才能让它看起来像这样:

['12','34','56','78','90']

For the same question with a list, see How do I split a list into equally-sized chunks?对于与列表相同的问题,请参阅How do I split a list into equally-sized chunks? . . The same techniques generally apply, though there are some variations.尽管有一些变化,但通常适用相同的技术。

>>> line = '1234567890'
>>> n = 2
>>> [line[i:i+n] for i in range(0, len(line), n)]
['12', '34', '56', '78', '90']

Just to be complete, you can do this with a regex:为了完整起见,您可以使用正则表达式执行此操作:

>>> import re
>>> re.findall('..','1234567890')
['12', '34', '56', '78', '90']

For odd number of chars you can do this:对于奇数个字符,您可以这样做:

>>> import re
>>> re.findall('..?', '123456789')
['12', '34', '56', '78', '9']

You can also do the following, to simplify the regex for longer chunks:您还可以执行以下操作,以简化更长块的正则表达式:

>>> import re
>>> re.findall('.{1,2}', '123456789')
['12', '34', '56', '78', '9']

And you can use re.finditer if the string is long to generate chunk by chunk.如果字符串很长,您可以使用re.finditer逐块生成。

There is already an inbuilt function in python for this.为此,python 中已经有一个内置函数。

>>> from textwrap import wrap
>>> s = '1234567890'
>>> wrap(s, 2)
['12', '34', '56', '78', '90']

This is what the docstring for wrap says:这就是 wrap 的文档字符串所说的:

>>> help(wrap)
'''
Help on function wrap in module textwrap:

wrap(text, width=70, **kwargs)
    Wrap a single paragraph of text, returning a list of wrapped lines.

    Reformat the single paragraph in 'text' so it fits in lines of no
    more than 'width' columns, and return a list of wrapped lines.  By
    default, tabs in 'text' are expanded with string.expandtabs(), and
    all other whitespace characters (including newline) are converted to
    space.  See TextWrapper class for available keyword args to customize
    wrapping behaviour.
'''

Another common way of grouping elements into n-length groups:将元素分组为 n 长度组的另一种常见方法:

>>> s = '1234567890'
>>> map(''.join, zip(*[iter(s)]*2))
['12', '34', '56', '78', '90']

This method comes straight from the docs for zip() .此方法直接来自zip()的文档。

I think this is shorter and more readable than the itertools version:我认为这比 itertools 版本更短、更易读:

def split_by_n(seq, n):
    '''A generator to divide a sequence into chunks of n units.'''
    while seq:
        yield seq[:n]
        seq = seq[n:]

print(list(split_by_n('1234567890', 2)))

I like this solution:我喜欢这个解决方案:

s = '1234567890'
o = []
while s:
    o.append(s[:2])
    s = s[2:]

Using more-itertools from PyPI:使用来自 PyPI 的more-itertools

>>> from more_itertools import sliced
>>> list(sliced('1234567890', 2))
['12', '34', '56', '78', '90']

You could use the grouper() recipe from itertools :您可以使用itertools grouper()配方:

Python 2.x: Python 2.x:

from itertools import izip_longest    

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

Python 3.x: Python 3.x:

from itertools import zip_longest

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

These functions are memory-efficient and work with any iterables.这些函数是内存高效的,并且可以与任何可迭代对象一起使用。

I was stucked in the same scenrio.我被困在同一个场景中。

This worked for me这对我有用

x="1234567890"
n=2
list=[]
for i in range(0,len(x),n):
    list.append(x[i:i+n])
print(list)

Output输出

['12', '34', '56', '78', '90']

Try the following code:试试下面的代码:

from itertools import islice

def split_every(n, iterable):
    i = iter(iterable)
    piece = list(islice(i, n))
    while piece:
        yield piece
        piece = list(islice(i, n))

s = '1234567890'
print list(split_every(2, list(s)))

This can be achieved by a simple for loop.这可以通过一个简单的 for 循环来实现。

a = '1234567890a'
result = []

for i in range(0, len(a), 2):
    result.append(a[i : i + 2])
print(result)

The output looks like ['12', '34', '56', '78', '90', 'a']输出看起来像 ['12', '34', '56', '78', '90', 'a']

>>> from functools import reduce
>>> from operator import add
>>> from itertools import izip
>>> x = iter('1234567890')
>>> [reduce(add, tup) for tup in izip(x, x)]
['12', '34', '56', '78', '90']
>>> x = iter('1234567890')
>>> [reduce(add, tup) for tup in izip(x, x, x)]
['123', '456', '789']

As always, for those who love one liners一如既往,对于那些喜欢单衬的人

n = 2  
line = "this is a line split into n characters"  
line = [line[i * n:i * n+n] for i,blah in enumerate(line[::n])]

Try this:试试这个:

s='1234567890'
print([s[idx:idx+2] for idx,val in enumerate(s) if idx%2 == 0])

Output:输出:

['12', '34', '56', '78', '90']

A simple recursive solution for short string:短字符串的简单递归解决方案:

def split(s, n):
    if len(s) < n:
        return []
    else:
        return [s[:n]] + split(s[n:], n)

print(split('1234567890', 2))

Or in such a form:或者以这样的形式:

def split(s, n):
    if len(s) < n:
        return []
    elif len(s) == n:
        return [s]
    else:
        return split(s[:n], n) + split(s[n:], n)

, which illustrates the typical divide and conquer pattern in recursive approach more explicitly (though practically it is not necessary to do it this way) ,它更明确地说明了递归方法中典型的分而治之模式(尽管实际上没有必要这样做)

more_itertools.sliced has been mentioned before. more_itertools.sliced之前已经提到过。 Here are four more options from the more_itertools library:以下是来自more_itertools库的more_itertools四个选项:

s = "1234567890"

["".join(c) for c in mit.grouper(2, s)]

["".join(c) for c in mit.chunked(s, 2)]

["".join(c) for c in mit.windowed(s, 2, step=2)]

["".join(c) for c in  mit.split_after(s, lambda x: int(x) % 2 == 0)]

Each of the latter options produce the following output:后面的每个选项都会产生以下输出:

['12', '34', '56', '78', '90']

Documentation for discussed options: grouper , chunked , windowed , split_after文档讨论的选项: grouperchunkedwindowedsplit_after

A solution with groupby : groupby的解决方案:

from itertools import groupby, chain, repeat, cycle

text = "wwworldggggreattecchemggpwwwzaz"
n = 3
c = cycle(chain(repeat(0, n), repeat(1, n)))
res = ["".join(g) for _, g in groupby(text, lambda x: next(c))]
print(res)

Output:输出:

['www', 'orl', 'dgg', 'ggr', 'eat', 'tec', 'che', 'mgg', 'pww', 'wza', 'z']

These answers are all nice and working and all, but the syntax is so cryptic... Why not write a simple function?这些答案都很好,很有效,但是语法太神秘了……为什么不写一个简单的函数呢?

def SplitEvery(string, length):
    if len(string) <= length: return [string]        
    sections = len(string) / length
    lines = []
    start = 0;
    for i in range(sections):
        line = string[start:start+length]
        lines.append(line)
        start += length
    return lines

And call it simply:并简单地称之为:

text = '1234567890'
lines = SplitEvery(text, 2)
print(lines)

# output: ['12', '34', '56', '78', '90']

Another solution using groupby and index//n as the key to group the letters:使用groupbyindex//n作为对字母进行分组的键的另一种解决方案:

from itertools import groupby

text = "abcdefghij"
n = 3

result = []
for idx, chunk in groupby(text, key=lambda x: x.index//n):
    result.append("".join(chunk))

# result = ['abc', 'def', 'ghi', 'j']

Spooky one – tried to invent yet another answer: 诡异的-尝试发明另一个答案:

def split(s, chunk_size):
    a = zip(*[s[i::chunk_size] for i in range(chunk_size)])
    return [''.join(t) for t in a]

print(split('1234567890', 1))
print(split('1234567890', 2))
print(split('1234567890', 3))

Out

['1', '2', '3', '4', '5', '6', '7', '8', '9', '0']
['12', '34', '56', '78', '90']
['123', '456', '789']
def split(s, n):
  """
  Split string every nth character

  Parameters
  ----------
  s: string
  n: value of nth
  """
  new_list = []
  for i in range(0, len(s), n):
    new_list.append(s[i:i+n])
  return new_list

print(split('1234567890', 2))

I know this question is old, but this is the shortest way to do it I'm aware of: 我知道这个问题很旧,但这是我所知道的最短的解决方法:

def split_every_n(S, n):
  return [S[i*n:(i+1)*n] for i in range(len(S) / n)]

This, however, assumes that the length of your string is a multiple of n. 但是,这假定字符串的长度是n的倍数。 Otherwise, you'd have to pad it. 否则,您将不得不垫上它。

One possibility is to use regular expressions: 一种可能性是使用正则表达式:

import re
re.findall("\w{3}", your_string)

This might be a little more clear 这可能会更清楚一点

##Define your string
mystring = '1234567890'

##Define your starting index
start = 0
##Define the end of your index for the first slice
end = 2

##Create an empty list
mylist =[]

##While the slice of characters without white space has something in it keep going
while len(mystring[start:end])>0:
    ##Add to the list
    mylist.append(mystring[start:end])
    ##Move the index up for the begining and ending of the slice
    start+=2
    end+=2
def splitstr(oldstr,n):
    start = 0
    end = n
    newlist =[]
    while len(oldstr[start:end])>0:
        newlist.append(oldstr[start:end])
        start+=n
        end+=n
    return newlist
print(splitstr('1234567890', 2))

I've got this code that I use whenever I need to do this: 每当执行此操作时,我都会使用以下代码:

def split_string(n, st):
    lst = [""]
    for i in str(st):
        l = len(lst) - 1
        if len(lst[l]) < n: 
            lst[l] += i
        else:
            lst += [i]
    return lst

print(split_string(3, "test_string."))

Where: 哪里:

  • n is the length of each list item n是每个列表项的长度
  • st is the string to be split up st是要拆分的字符串
  • lst is the list version of st lstst的列表版本
  • i is the current character being used in st ist中使用的当前字符
  • l is the length of the last list item l是最后一个列表项的长度

Here is another solution for a more general case where the chunks are not of equal length. 这是针对块长度不相等的更一般情况的另一种解决方案。 If the length is 0, all the remaining part is returned. 如果长度为0,则返回所有剩余部分。

data is the sequence to be split; data是要分割的序列; fieldsize is a tuple with the list of the field length. fieldsize是具有字段长度列表的元组。

def fieldsplit(data=None, fieldsize=()):
    tmpl=[];
    for pp in fieldsize:
        if(pp>0):
            tmpl.append(line[:pp]);
            line=line[pp:];
        else:
            tmpl.append(line);
            break;
    return tuple(tmpl);

I am using this: 我正在使用这个:

list(''.join(s) for s in zip(my_str[::2], my_str[1::2]))

or you can use any other n number instead of 2 . 或者您可以使用任何其他n数而不是2

This question reminds me of the Perl 6 .comb(n) method. 这个问题使我想起了Perl 6 .comb(n)方法。 It breaks up strings into n -sized chunks. 它将字符串分成n大小的块。 (There's more to it than that, but I'll leave out the details.) (不仅限于此,但我将省略细节。)

It's easy enough to implement a similar function in Python3 as a lambda expression: 可以很容易地在Python3中实现一个类似的函数作为lambda表达式:

comb = lambda s,n: [s[i:i+n] for i in range(0,len(s),n)]

Then you can call it like this: 然后您可以这样称呼它:

comb('1234567', 2)   # returns ['12', '34', '56', '7']

This comb() function will also operate on lists (to produce a list of lists): comb()函数还将对列表进行操作(以产生列表列表):

comb(['cat', 'dog', 'bird'], 2)  # returns [['cat', 'dog'], ['bird']]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM