简体   繁体   English

如何从 zip 文件的多个文件夹访问多个 CSV 文件

[英]How to access multiple CSV files that share the same name from multiple folders from a zip file

I have a zip file (stored locally) with multiple folders in it.我有一个 zip 文件(本地存储),其中包含多个文件夹。 In each folder are a few CSV files.每个文件夹中都有几个 CSV 文件。 I need to only access 1 particular CSV from each folder.我只需要从每个文件夹中访问 1 个特定的 CSV。 The CSV's I am trying to access from each folder all share the same name, but I cannot figure out how to access a particular file from each folder, then concatenate them into a pandas df.我试图从每个文件夹访问的 CSV 都共享相同的名称,但我无法弄清楚如何从每个文件夹访问特定文件,然后将它们连接到 pandas df 中。

I have tried the below (initially trying to read all CSV's):我尝试了以下方法(最初尝试读取所有 CSV):

path = r"C:\Users\...\Downloads\folder.zip"
all_files = glob.glob(os.path.join(path , "/*.csv"))

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

But I get: ValueError: No objects to concatenate.但我得到: ValueError:没有要连接的对象。 The CSV's are definitely present and not empty. CSV 肯定存在且不为空。

I am currently trying to do this in a sagemaker notebook, not sure if that is also causing me problems.我目前正在尝试在 sagemaker 笔记本中执行此操作,不确定这是否也会导致我出现问题。 Any help would be great.任何帮助都会很棒。

After some digging and advice from Umar.H and mad, I figured out a solution to my original question and to the code example I was originally working with.在 Umar.H 和 mad 的一些挖掘和建议之后,我找到了解决我最初的问题和我最初使用的代码示例的解决方案。

The code I was originally working with wasn't working with accessing the zip file directly, so I unzipped the file and tried it on just a regular folder.我最初使用的代码无法直接访问 zip 文件,因此我解压缩了该文件并仅在一个常规文件夹上进行了尝试。 Amending the empty list of df's li to not return an empty list was solved by changing "/*file.csv" in all_files to "*/*file.csv .通过将 all_files 中的"/*file.csv"更改为"*/*file.csv来解决将 df 的li的空列表修改为不返回空列表的问题。

To solve the main issue I had, which was to avoid unzipping the zip file and access all required CSV's I managed to get the following to work为了解决我遇到的主要问题,即避免解压缩 zip 文件并访问所有必需的 CSV,我设法让以下工作

PATH = "C:/Users/.../Downloads/folder.zip"

li = []
with zipfile.ZipFile(PATH, "r") as f:
    for name in f.namelist():
        if name.endswith("file.csv"):
            data = f.open(name)
            df = pd.read_csv(data, header=None, low_memory = False)
            li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

Hope this can be helpful for anyone else with large zip files.希望这对其他拥有大型 zip 文件的人有所帮助。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用python从位于同一目录中的多个zip文件夹中读取csv文件? - How to read csv files from multiple zip folders located in the same directory using python? 如何从多个 zip 文件夹中的文本文件复制特定行? - how to copy specific line from text files in multiple zip folders? 如何组合 Python 中多个文件夹中的多个 CSV 文件? - how to combine multiple CSV files from multiple folders in Python? 如何将多个 zip 文件中的文件添加到单个 zip 文件中 - How to add files from multiple zip files into the single zip file Python - 从 CSV 文件创建多个文件夹 - Python - Create multiple folders from CSV file 将来自不同文件夹的多个csv文件中的选定列合并到单个csv文件中 - Combine selected column from multiple csv files from different folders to a single csv file 如何访问相同子文件夹的所有子文件夹名称和包含的文件并制作 XLSX 或 CSV 文件? - How to access all the sub folders name and contained files of the same subfolders and make a XLSX or CSV file? 如何从多个文件夹中获取具有相同相对路径的冲突文件? - How to get conflicting files with same relative paths from multiple folders? 如何将多个文件夹中的相似命名文件合并为每个文件名的一个文件夹 - How to combine similar named files from multiple folders into one folder for each file name 从多个压缩文件夹中搜索多个.csv 文件中的字符串 - Search for a string in multiple .csv files from a multiple zipped folders
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM