将 .csv 文件中的数据放入数组

Question

我有一个包含 57 个 .csv 文件的数据集。 我想在一个变量（称为 FOS）中读取它们。 所以 FOS 必须是一个数组。 但是如何使用 Pandas 将这些 .csv 文件加载到数组中？ 另外，还有一些文件丢失...

我尝试创建一个 for 循环并希望将每个文件放在数组的特定位置。 像 FOS[0] 上的 File_1.csv 和 FOS[57] 上的 File_57。

FOS=[]
for i in range(1,57):        
    if i != 5:      # Because Filename_5 is missing in the dataset...
        FOL[i]=pd.read_csv("Path\Filename{0}.csv".format(i), some more parameters like name)

但现在我收到错误：“IndexError：列表分配索引超出范围”

Answer 1

你可以做一些简短的事情：

import os

FOS=[pd.read_csv(f"Path/Filename{i}.csv")
        for i in range(1,57)
            if os.path.exists(f"Path/Filename{i}.csv")
]

说明：这使用列表理解。 这意味着表达式[....]构造了列表。 它相当于写：

EOS= list()
for i in range(1,57):
    if os.path.exists(f"Path/Filename{i}.csv"):
        EOS.append(pd.read_csv(f"Path/Filename{i}.csv"))

if os.path.exists(f"Path/Filename{i}.csv")在排除文件 5 方面更具动态性。如果您更频繁地执行此操作并且您的输入文件有所不同，它会更方便。 但也许在这种情况下，您应该阅读文件列表（例如，使用os.listdir ）。

Answer 2

你可以让它更有活力。 首先将所有需要读取的文件移动到一个目录中。 现在使用os module遍历并获取所有文件路径，以防您有子目录。

import os

import pandas as pd


def _fetch_file_locations(root_path: str, extension: str) -> iter:
    """
    This function reads all files of a particular extension. It traverses 
    through sub directories and finds all files 
    :param root_path: the path from where it needs to start looking for files
    :param extension: the extension of the file that it's looking for
    :return: the array of file paths of all the files that were found
    """
    if not os.path.isdir(root_path):
        raise NotADirectoryError(f'There is no directory at path: {root_path}')

    file_collection = []

    file_collection += [os.path.join(root, file) for root, dirs, files in os.walk(root_path)
                        for file in files if extension in file]

    return file_collection


def main(root_path: str):
    all_files = _fetch_file_locations(root_path, extension='.csv')

    # uses pandas to read all he CSV files and convert the dataframe to an array of dictionary
    file_contents = [pd.read_csv(file_path).to_dict('record') for file_path in all_files]

    # converts the array of arrays into a single array of dicts
    all_contents_in_one = [record for content in file_contents for record in content]

    print(f"Found {len(all_contents_in_one)} records after merging {len(all_files)}")


if __name__ == '__main__':
    main(r'X:\worky')

将 .csv 文件中的数据放入数组

问题描述

2 个解决方案

解决方案1
0 已采纳 2019-07-14 08:40:04

解决方案2
0 2019-07-14 08:49:39

将 .csv 文件中的数据放入数组

问题描述

2 个解决方案

解决方案1 0 已采纳 2019-07-14 08:40:04

解决方案2 0 2019-07-14 08:49:39

解决方案1
0 已采纳 2019-07-14 08:40:04

解决方案2
0 2019-07-14 08:49:39