I'm reading big csv files using pandas.read_csv() and chunksize = 500000. Since I'm using chunksize so "skipfooter=1" option doesn't work with chunksize as it returns a generator instead of dataframe.
What's the best way to skip footer record from the file while reading in chunks?
Something like this would work:
import pandas
chunksize = 5
csv = pandas.read_csv('sample.csv', chunksize=chunksize)
class NextIterator:
def __init__(self, iterator):
self._iterator = iterator
self._buffer = []
def __iter__(self):
return self
@property
def has_next(self):
try:
self._buffer = [next(self._iterator)]
return True
except StopIteration:
return False
def __next__(self):
if self._buffer:
return self._buffer.pop()
else:
# returns the dataframe
return next(self._iterator)
has_next = True
b = NextIterator(csv)
while has_next:
a = next(b)
if b.has_next:
print(a)
else:
print(a[:-1])
has_next = False
You don't necessarily need to create a class, but I found it useful.
Using next and catching the StopIteration
you can check if there is more in your iterator. And if there isn't you can just slice your chunk to exclude the last element.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.