简体   繁体   中英

Iterate through a string in chunks of different sizes python

So I am working with files in python, feel like there is a name for them but I'm not sure what it is. They are like csv files but with no separator. Anyway in my file I have lots of lines of data where the first 7 characters are an ID number then the next 5 are something else and so on. So I want to go through the file reading each line and splitting it up and storing it into a list. Here is an example:

From the file: "0030108102017033119080001010048000000"

These are the chunks I would like to split the string into: [7, 2, 8, 6, 2, 2, 5, 5] Each number represents the length of each chunk.

First I tried this:

n = [7, 2, 8, 6, 2, 2, 5, 5]
for i in range(0, 37, n):
    print(i)

Naturally this didn't work, so now I've started thinking about possible methods and they all seem quite complex. I looked around online and couldn't seem to find anything, only even sized chunks. So any input?

EDIT: The answer I'm looking for should in this case look like this: ['0030108', '10', '20170331', '190800', '01', '01', '00480', '00000'] Where each value in the list n represents the length of each chunk.

If these are ASCII strings (or rather, one byte per character), I might use struct.unpack for this.

>>> import struct
>>> sizes = [7, 2, 8, 6, 2, 2, 5, 5]
>>> struct.unpack(''.join("%ds" % x for x in sizes), "0030108102017033119080001010048000000")
('0030108', '10', '20170331', '190800', '01', '01', '00480', '00000')
>>>

Otherwise, you can construct the necessary slice objects from partial sums of the sizes, which is simple to do if you are using Python 3:

>>> psums = list(itertools.accumulate([0] + sizes))
>>> [s[slice(*i)] for i in zip(psums, psums[1:])]
['0030108', '10', '20170331', '190800', '01', '01', '00480', '00000']

accumulate can be implemented in Python 2 with something like

def accumulate(itr):
    total = 0
    for x in itr:
        total += x
        yield total
from itertools import accumulate, chain
s = "0030108102017033119080001010048000000"
n = [7, 2, 8, 6, 2, 2, 5, 5]
ranges = list(accumulate(n))
list(map(lambda i: s[i[0]:i[1]], zip(chain([0], ranges), ranges))
# ['0030108', '10', '20170331', '190800', '01', '01', '00480', '00000']

Could you try this?

for line in file:
    n = [7, 2, 8, 6, 2, 2, 5, 5]
    total = 0
    for i in n:
        print(line[total:total+i])
        total += i 

This is how I might have done it. The code iterates through each line in the file, and for each line, iterate through the list of lengths you need to pull out which is in the list n . This can be amended to do something else instead of print, but the idea is that a slice is returned from the line. The total variable keeps track of how far into the lines we are.

Here's a generator that yields the chunks by iterating through the characters of the lsit and forming substrings from them. You can use this to process any iterable in this fashion.:

def chunks(s, sizes):
    it = iter(s)
    for size in sizes:
        l = []
        try:
            for _ in range(size):
                l.append(next(it))
        finally:
            yield ''.join(l)

s="0030108102017033119080001010048000000"
n = [7, 2, 8, 6, 2, 2, 5, 5]
print(list(chunks(s, n)))
# ['0030108', '10', '20170331', '190800', '01', '01', '00480', '00000']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM