How to skip footer in csv file while reading with pandas read_csv and chunksize option

Question

I'm reading big csv files using pandas.read_csv() and chunksize = 500000. Since I'm using chunksize so "skipfooter=1" option doesn't work with chunksize as it returns a generator instead of dataframe.

What's the best way to skip footer record from the file while reading in chunks?

Answer 1

Something like this would work:

import pandas

chunksize = 5
csv = pandas.read_csv('sample.csv', chunksize=chunksize)

class NextIterator:
    def __init__(self, iterator):
        self._iterator = iterator
        self._buffer = []

    def __iter__(self):
        return self

    @property
    def has_next(self):
        try:
            self._buffer = [next(self._iterator)]
            return True
        except StopIteration:
            return False

    def __next__(self):
        if self._buffer:
            return self._buffer.pop()
        else:
            # returns the dataframe
            return next(self._iterator)

has_next = True
b = NextIterator(csv)
while has_next:
    a = next(b)
    if b.has_next:
        print(a)
    else:
        print(a[:-1])
        has_next = False

You don't necessarily need to create a class, but I found it useful.

Using next and catching the StopIteration you can check if there is more in your iterator. And if there isn't you can just slice your chunk to exclude the last element.

How to skip footer in csv file while reading with pandas read_csv and chunksize option

Question

1 answers

solution1
0 2019-08-21 16:35:46

How to skip footer in csv file while reading with pandas read_csv and chunksize option

Question

1 answers

solution1 0 2019-08-21 16:35:46

solution1
0 2019-08-21 16:35:46