简体   繁体   中英

How to skip footer in csv file while reading with pandas read_csv and chunksize option

I'm reading big csv files using pandas.read_csv() and chunksize = 500000. Since I'm using chunksize so "skipfooter=1" option doesn't work with chunksize as it returns a generator instead of dataframe.

What's the best way to skip footer record from the file while reading in chunks?

Something like this would work:

import pandas

chunksize = 5
csv = pandas.read_csv('sample.csv', chunksize=chunksize)

class NextIterator:
    def __init__(self, iterator):
        self._iterator = iterator
        self._buffer = []

    def __iter__(self):
        return self

    @property
    def has_next(self):
        try:
            self._buffer = [next(self._iterator)]
            return True
        except StopIteration:
            return False

    def __next__(self):
        if self._buffer:
            return self._buffer.pop()
        else:
            # returns the dataframe
            return next(self._iterator)

has_next = True
b = NextIterator(csv)
while has_next:
    a = next(b)
    if b.has_next:
        print(a)
    else:
        print(a[:-1])
        has_next = False

You don't necessarily need to create a class, but I found it useful.

Using next and catching the StopIteration you can check if there is more in your iterator. And if there isn't you can just slice your chunk to exclude the last element.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM