简体   繁体   English

在导入为 pandas dataframe 之前从压缩的 csv 文件获取标题

[英]Getting headers from zipped csv file before importing as pandas dataframe

I am trying to import csv files as a pandas dataframe, where the csv files are inside a zip file.我正在尝试将 csv 文件导入为 pandas dataframe,其中 csv 文件位于 zip 文件中。 For efficient importing, I'm trying to get the headers first before I load it into a pandas dataframe.为了高效导入,我试图在将其加载到 pandas dataframe 之前先获取标头。

What I tried so far is this:到目前为止我尝试的是:

from zipfile import ZipFile
from io import TextIOWrapper
import pandas as pd

with ZipFile(zip_path, 'r') as zipfile:
    with zipfile.open(file_path, 'r') as file:
        reader = csv.reader(TextIOWrapper(file, 'utf-8', newline=''))
        headers = next(reader)

        df = pd.read_csv(file)

The problem is when I get the headers with next(reader) the underlying file is affected, and the file is imported as a pandas dataframe without headers.问题是当我使用next(reader)获取标头时,基础文件会受到影响,并且文件将导入为 pandas dataframe 没有标头。

Would really appreciate any fix.真的很感激任何修复。

You can reset CSV iterator using function seek():您可以使用 function seek() 重置 CSV 迭代器:

with ZipFile('test.zip', 'r') as zipfile:
    with zipfile.open('test.csv', 'r') as file:
        reader = csv.reader(TextIOWrapper(file, 'utf-8', newline=''))
        headers = next(reader)
        # reset CSV iterator
        file.seek(0)
        df = pd.read_csv(file)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM