[英]Getting headers from zipped csv file before importing as pandas dataframe
I am trying to import csv files as a pandas dataframe, where the csv files are inside a zip file.我正在尝试将 csv 文件导入为 pandas dataframe,其中 csv 文件位于 zip 文件中。 For efficient importing, I'm trying to get the headers first before I load it into a pandas dataframe.
为了高效导入,我试图在将其加载到 pandas dataframe 之前先获取标头。
What I tried so far is this:到目前为止我尝试的是:
from zipfile import ZipFile
from io import TextIOWrapper
import pandas as pd
with ZipFile(zip_path, 'r') as zipfile:
with zipfile.open(file_path, 'r') as file:
reader = csv.reader(TextIOWrapper(file, 'utf-8', newline=''))
headers = next(reader)
df = pd.read_csv(file)
The problem is when I get the headers with next(reader)
the underlying file is affected, and the file is imported as a pandas dataframe without headers.问题是当我使用
next(reader)
获取标头时,基础文件会受到影响,并且文件将导入为 pandas dataframe 没有标头。
Would really appreciate any fix.真的很感激任何修复。
You can reset CSV iterator using function seek():您可以使用 function seek() 重置 CSV 迭代器:
with ZipFile('test.zip', 'r') as zipfile:
with zipfile.open('test.csv', 'r') as file:
reader = csv.reader(TextIOWrapper(file, 'utf-8', newline=''))
headers = next(reader)
# reset CSV iterator
file.seek(0)
df = pd.read_csv(file)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.