如何将一个很长的字符串拆分为 python 中较短的字符串列表

Question

In my current django project I have a model that stores very long strings (can be 5000-10000 or even more characters per DB entry) and then i need to split them when a user is calling the record (it really need to be in one record in the DB).在我当前的 django 项目中，我有一个 model 存储非常长的字符串（每个数据库条目可以是 5000-10000 甚至更多字符），然后我需要在用户调用记录时将它们拆分（它确实需要在一个记录在数据库中）。 What i need is it to return a list (queryset? depends if in the "SQL" part or getting all the list as is and doing the parsing in the view) of shorter strings (100 - 500 characters per sting in the list i return to the template).我需要的是返回一个较短字符串的列表（查询集？取决于是在“SQL”部分还是按原样获取所有列表并在视图中进行解析）较短的字符串（我返回的列表中每个字符串 100-500 个字符）到模板）。

I couldn't find anywhere a python split command nor example or any kind of answer for that....我在任何地方都找不到 python 拆分命令，也找不到示例或任何类型的答案....

I could always count words and append but count words.... but i am sure there has to be some kind of function for that sort of things....我总是可以数字数和 append 但数字数....但我确信必须有某种 function 用于这类事情....

EDIT: thank you everyone, but i guess i wasn't understood,编辑：谢谢大家，但我想我没有被理解，

Example:例子：

The String: "This is a very long string with many many many many and many more sentences and there is not one character that i can use to split by, just by number of words"字符串：“这是一个很长的字符串，有很多很多很多很多的句子，没有一个字符可以用来分割，只是按单词的数量”

the string is a textField of django model.该字符串是 django model 的文本字段。

i need to split it, lets say every 5 words so i will get:我需要拆分它，让我们每5 个单词说一次，这样我会得到：

['This is a very long string','with many many many many','and many more sentences and','there is not one character','that i can use to','split by, just by number',' of words'] ['这是一个很长的字符串'，'有很多很多'，'还有很多句子'，'没有一个字符'，'我可以用'，'分割，只是按数字' ,'单词']

The thing is that is almost every programming languages there is split per number of words" kind of utility function but i can't find one in python.问题是几乎每一种编程语言都按字数分割”一种实用程序 function但我在 python 中找不到。

thanks, Erez谢谢，埃雷兹

Answer 1

>>> s = "This is a very long string with many many many many and many more sentences and there is not one character that i can use to split by, just by number of words"
>>> l = s.split()
>>> n = 5
>>> [' '.join(l[x:x+n]) for x in xrange(0, len(l), n)]
['This is a very long',
 'string with many many many',
 'many and many more sentences',
 'and there is not one',
 'character that i can use',
 'to split by, just by',
 'number of words']

Answer 2

Here is an idea:这是一个想法：

def split_chunks(s, chunksize):
    pos = 0
    while(pos != -1):
        new_pos = s.rfind(" ", pos, pos+chunksize)
        if(new_pos == pos):
            new_pos += chunksize # force split in word
        yield s[pos:new_pos]
        pos = new_pos

This tries to split strings into chunks at most chunksize in length.这会尝试将字符串拆分为最大长度为chunksize的块。 It tries to split at spaces, but if it can't it splits in the middle of a word:它尝试在空格处拆分，但如果不能，它会在单词中间拆分：

>>> foo = "asdf qwerty sderf sdefw regf"
>>> list(split_chunks(foo, 6)
['asdf', ' qwert', 'y', ' sderf', ' sdefw', ' regf', '']

I guess it requires some tweaking though (for instance how to handle splits that occur inside words), but it should give you a starting point.我想它需要一些调整（例如如何处理单词内部发生的拆分），但它应该给你一个起点。

To split by number of words, do this:要按字数拆分，请执行以下操作：

def split_n_chunks(s, words_per_chunk):
    s_list = s.split()
    pos = 0
    while pos < len(s_list):
        yield s_list[pos:pos+words_per_chunk]
        pos += words_per_chunk

如何将一个很长的字符串拆分为 python 中较短的字符串列表

问题描述

2 个解决方案

解决方案1
8 已采纳 2011-05-31 12:00:10

解决方案2
1 2011-05-31 11:27:32

如何将一个很长的字符串拆分为 python 中较短的字符串列表

问题描述

2 个解决方案

解决方案1 8 已采纳 2011-05-31 12:00:10

解决方案2 1 2011-05-31 11:27:32

解决方案1
8 已采纳 2011-05-31 12:00:10

解决方案2
1 2011-05-31 11:27:32