简体   繁体   English

Python中如何将字符串拆分为固定数量的部分?

[英]How to split a string into a fixed number of parts in Python?

I know the method textwrap.wrap , but this method splits a string to a fixed length for each part, but I'm looking for a function in python that splits the string the string into fixed num of parts.我知道方法textwrap.wrap ,但是这种方法将字符串拆分为每个部分的固定长度,但我在 python 中寻找 function ,它将字符串拆分为固定数量的部分。

For example: string = "Hello, my name is foo"例如: string = "Hello, my name is foo"
and foo(string, 7)foo(string, 7)
returns ['Hel', 'lo,', ' my', ' na', 'me ', 'is ', 'foo']返回['Hel', 'lo,', ' my', ' na', 'me ', 'is ', 'foo']

Algorithmically, I know how to implement this method, but I want to know if there a module that provides it or a "magic function" in the regex module that answers this problem...从算法上讲,我知道如何实现这个方法,但我想知道是否有提供它的模块或正则表达式模块中的“魔术函数”来回答这个问题......

One approach can be using re . 一种方法可以使用re

import re
string = "Hello, my name is foo"
def foo(string, parts):
    x=len(string)/parts
    print re.findall(r".{"+str(x)+r"}|.+?$",string)

foo(string,7)

Output: ['Hel', 'lo,', ' my', ' na', 'me ', 'is ', 'foo'] 输出: ['Hel', 'lo,', ' my', ' na', 'me ', 'is ', 'foo']

I don't know if any module does this... but I feel compelled to say that the problem here is basically What is the most "pythonic" way to iterate over a list in chunks? 我不知道是否有任何模块可以执行此操作...但是我不得不说这里的问题本质上是迭代块中列表的最“ pythonic”方法什么? , except you have strings instead of lists. ,除了您使用字符串而不是列表。 But the most pythonic way there should also be the most pythonic here, I suppose, and it's a good thing if you can avoid re . 我想,最pythonic的方式这里也应该是python最多的,如果可以避免re ,那是一件好事。 So here is the solution (not sure what you want if the string cannot be evenly divided by the number of parts; assuming you simply discard the "remainder"): 因此,这里是解决方案(如果不能将字符串不能均匀地除以部分数量,则不知道要什么;假设您只是丢弃“剩余”):

# python 3 version
def foo(string, n):
    part_len = -(-len(string) // n)  # same as math.ceil(len(string) / n)
    return [''.join(x) for x in zip(*[iter_str] * part_len)]

Thus: 从而:

>>> s = "Hello, my name is foo"
>>> foo(s, 7)
['Hel', 'lo,', ' my', ' na', 'me ', 'is ', 'foo']
>>> foo(s, 6)
['Hell', 'o, m', 'y na', 'me i', 's fo']

Now admittedly having foo(s, 6) return a list of length 5 is somewhat surprising. 现在公认的是foo(s, 6)返回长度为5的列表有些令人惊讶。 Maybe you want to raise an exception instead. 也许您想提出一个例外。 If you want to keep the remainder, then use zip_longest 如果要保留其余部分,请使用zip_longest

from itertools import zip_longest

def foo2(string, n, pad=''):
    part_len = -(-len(string) // n)
    return [''.join(x) for x in zip_longest(*[iter(string)] * part_len, fillvalue=pad)]

>>> foo2(s, 6)
['Hell', 'o, m', 'y na', 'me i', 's fo', 'o']
>>> foo2(s, 6, pad='?')
['Hell', 'o, m', 'y na', 'me i', 's fo', 'o???']

I don't think there is a builtin, but I think you could do it with regex: https://stackoverflow.com/a/9477447/1342445 我认为没有内置函数,但我认为您可以使用正则表达式来做到这一点: https : //stackoverflow.com/a/9477447/1342445

In that case your function generates the regex from the len(input) / int(parts) of the string, and raises an error if it's not divisible by the input. 在这种情况下,您的函数会从字符串的len(input)/ int(parts)生成正则表达式,如果输入不能将其整除,则会引发错误。 Would be much simpler with undefined remainder behavior :) 使用未定义的剩余行为会更简单:)

I think it would look something like: 我认为它看起来像:

import re


def split_into(string: str, parts: int):
    if (len(string) % parts) != 0:
        raise NotImplementedError('string is not divisible by # parts')

    chunk_size = len(string) / parts
    regex = '.'*chunk_size
    return re.findall(regex, string)

Yet another solution to this problem...这个问题的另一个解决方案......

# split text to parts                                                           
def split_to_parts(txt,parts):
  # return array
  ret=[]
  # calculate part length
  part_len=int(len(txt)/parts)
  # iterate and fill the return array
  for i in range(parts):
    # divide the text
    piece=txt[part_len*i:part_len*(i+1)]
    # add it to the return array
    ret.append(piece)
  # return the array
  return(ret)

txt = "Hello, my name is foo"
parts=7    
split_to_parts(txt,parts)

# output:
# ['Hel', 'lo,', ' my', ' na', 'me ', 'is ', 'foo']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM