简体   繁体   English

在不同大小的python块中遍历字符串

[英]Iterate through a string in chunks of different sizes python

So I am working with files in python, feel like there is a name for them but I'm not sure what it is. 所以我正在用python处理文件,感觉它们有一个名字,但是我不确定它是什么。 They are like csv files but with no separator. 它们就像csv文件,但没有分隔符。 Anyway in my file I have lots of lines of data where the first 7 characters are an ID number then the next 5 are something else and so on. 无论如何,在我的文件中,我都有很多行数据,其中前7个字符是ID号,然后下5个字符是其他数字,依此类推。 So I want to go through the file reading each line and splitting it up and storing it into a list. 所以我想遍历文件,读取每一行并将其拆分并存储到列表中。 Here is an example: 这是一个例子:

From the file: "0030108102017033119080001010048000000" 来自文件: "0030108102017033119080001010048000000"

These are the chunks I would like to split the string into: [7, 2, 8, 6, 2, 2, 5, 5] Each number represents the length of each chunk. 这些是我想将字符串分割成的块: [7, 2, 8, 6, 2, 2, 5, 5]每个数字代表每个块的长度。

First I tried this: 首先,我尝试了这个:

n = [7, 2, 8, 6, 2, 2, 5, 5]
for i in range(0, 37, n):
    print(i)

Naturally this didn't work, so now I've started thinking about possible methods and they all seem quite complex. 自然这是行不通的,所以现在我开始考虑可能的方法,而且它们似乎都非常复杂。 I looked around online and couldn't seem to find anything, only even sized chunks. 我在网上四处张望,似乎什么也找不到,甚至找不到大小的块。 So any input? 有什么输入吗?

EDIT: The answer I'm looking for should in this case look like this: ['0030108', '10', '20170331', '190800', '01', '01', '00480', '00000'] Where each value in the list n represents the length of each chunk. 编辑:在这种情况下,我正在寻找的答案应如下所示: ['0030108', '10', '20170331', '190800', '01', '01', '00480', '00000']其中列表n中的每个值代表每个块的长度。

If these are ASCII strings (or rather, one byte per character), I might use struct.unpack for this. 如果这些是ASCII字符串(或者每个字符一个字节),那么我可以使用struct.unpack

>>> import struct
>>> sizes = [7, 2, 8, 6, 2, 2, 5, 5]
>>> struct.unpack(''.join("%ds" % x for x in sizes), "0030108102017033119080001010048000000")
('0030108', '10', '20170331', '190800', '01', '01', '00480', '00000')
>>>

Otherwise, you can construct the necessary slice objects from partial sums of the sizes, which is simple to do if you are using Python 3: 否则,您可以从部分大小的和中构造必要的slice对象,如果您使用的是Python 3,这很容易做到:

>>> psums = list(itertools.accumulate([0] + sizes))
>>> [s[slice(*i)] for i in zip(psums, psums[1:])]
['0030108', '10', '20170331', '190800', '01', '01', '00480', '00000']

accumulate can be implemented in Python 2 with something like accumulate可以在Python 2中用类似的方式实现

def accumulate(itr):
    total = 0
    for x in itr:
        total += x
        yield total
from itertools import accumulate, chain
s = "0030108102017033119080001010048000000"
n = [7, 2, 8, 6, 2, 2, 5, 5]
ranges = list(accumulate(n))
list(map(lambda i: s[i[0]:i[1]], zip(chain([0], ranges), ranges))
# ['0030108', '10', '20170331', '190800', '01', '01', '00480', '00000']

Could you try this? 你可以试试这个吗?

for line in file:
    n = [7, 2, 8, 6, 2, 2, 5, 5]
    total = 0
    for i in n:
        print(line[total:total+i])
        total += i 

This is how I might have done it. 这就是我可能要做的。 The code iterates through each line in the file, and for each line, iterate through the list of lengths you need to pull out which is in the list n . 代码遍历文件的每一行,对于每一行,遍历您需要拉出的长度列表n (位于列表n This can be amended to do something else instead of print, but the idea is that a slice is returned from the line. 可以将其修改为执行其他操作而不是打印,但是这样做的目的是从该行返回一个切片。 The total variable keeps track of how far into the lines we are. total变量跟踪我们到行中的距离。

Here's a generator that yields the chunks by iterating through the characters of the lsit and forming substrings from them. 这是一个生成器,它通过遍历lsit的字符并从中形成子串来产生块。 You can use this to process any iterable in this fashion.: 您可以使用此方法以这种方式处理任何可迭代的对象:

def chunks(s, sizes):
    it = iter(s)
    for size in sizes:
        l = []
        try:
            for _ in range(size):
                l.append(next(it))
        finally:
            yield ''.join(l)

s="0030108102017033119080001010048000000"
n = [7, 2, 8, 6, 2, 2, 5, 5]
print(list(chunks(s, n)))
# ['0030108', '10', '20170331', '190800', '01', '01', '00480', '00000']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM