使用 Python 从目录中读取所有 csv 个文件

Question

I hope this is not trivial but I am wondering the following:我希望这不是微不足道的，但我想知道以下几点：

If I have a specific folder with n csv files, how could I iteratively read all of them, one at a time, and perform some calculations on their values?如果我有一个包含n 个csv文件的特定文件夹，我如何迭代读取所有这些文件，一次一个，并对它们的值执行一些计算？

For a single file, for example, I do something like this and perform some calculations on the x array:例如，对于单个文件，我执行类似的操作并对x数组执行一些计算：

import csv
import os

directoryPath=raw_input('Directory path for native csv file: ') 
csvfile = numpy.genfromtxt(directoryPath, delimiter=",")
x=csvfile[:,2] #Creates the array that will undergo a set of calculations

I know that I can check how many csv files there are in a given folder (check here ):我知道我可以检查给定文件夹中有多少csv文件（在此处检查）：

import glob
for files in glob.glob("*.csv"):
    print files

But I failed to figure out how to possibly nest the numpy.genfromtxt() function in a for loop, so that I read in all the csv files of a directory that it is up to me to specify.但是我没有弄清楚如何将numpy.genfromtxt() function 嵌套在 for 循环中，以便我读入由我指定的目录的所有 csv 文件。

EDIT编辑

The folder I have only has jpg and csv files.我的文件夹只有jpg和csv文件。 The latter are named eventX.csv , where X ranges from 1 to 50. The for loop I am referring to should therefore consider the file names the way they are.后者被命名为eventX.csv ，其中X的范围从 1 到 50。因此，我所指的for循环应该按原样考虑文件名。

Answer 1

That's how I'd do it:我就是这样做的：

import os

directory = os.path.join("c:\\","path")
for root,dirs,files in os.walk(directory):
    for file in files:
       if file.endswith(".csv"):
           f=open(file, 'r')
           #  perform calculation
           f.close()

Answer 2

Using pandas and glob as the base packages使用 pandas 和 glob 作为基础包

import glob
import pandas as pd

glued_data = pd.DataFrame()
for file_name in glob.glob(directoryPath+'*.csv'):
    x = pd.read_csv(file_name, low_memory=False)
    glued_data = pd.concat([glued_data,x],axis=0)

Answer 3

I think you look for something like this我想你在寻找这样的东西

import glob

for file_name in glob.glob(directoryPath+'*.csv'):
    x = np.genfromtxt(file_name,delimiter=',')[:,2]
    # do your calculations

Edit编辑

If you want to get all csv files from a folder (including subfolder) you could use subprocess instead of glob (note that this code only works on linux systems)如果您想从一个文件夹（包括子文件夹）中获取所有csv文件，您可以使用subprocess而不是glob （请注意，此代码仅适用于 linux 系统）

import subprocess
file_list = subprocess.check_output(['find',directoryPath,'-name','*.csv']).split('\n')[:-1]

for i,file_name in enumerate(file_list):
    x = np.genfromtxt(file_name,delimiter=',')[:,2]
    # do your calculations
    # now you can use i as an index

It first searches the folder and sub-folders for all file_names using the find command from the shell and applies your calculations afterwards.它首先使用 shell 中的find命令在文件夹和子文件夹中搜索所有文件名，然后应用您的计算。

Answer 4

According to the documentation of numpy.genfromtxt() , the first argument can be a根据numpy.genfromtxt()的文档，第一个参数可以是

File, filename, or generator to read.要读取的文件、文件名或生成器。

That would mean that you could write a generator that yields the lines of all the files like this:这意味着您可以编写一个生成器来生成所有文件的行，如下所示：

def csv_merge_generator(pattern):
    for file in glob.glob(pattern):
        for line in file:
            yield line

# then using it like this

numpy.genfromtxt(csv_merge_generator('*.csv'))

should work.应该管用。 (I do not have numpy installed, so cannot test easily) （我没有安装 numpy，所以不能轻易测试）

Answer 5

Here's a more succinct way to do this, given some path = "/path/to/dir/" .给定一些path = "/path/to/dir/" ，这是一种更简洁的方法。

import glob
import pandas as pd

pd.concat([pd.read_csv(f) for f in glob.glob(path+'*.csv')])

Then you can apply your calculation to the whole dataset, or, if you want to apply it one by one:然后你可以将你的计算应用到整个数据集，或者，如果你想一个一个地应用它：

pd.concat([process(pd.read_csv(f)) for f in glob.glob(path+'*.csv')])

Answer 6

The function below will return a dictionary containing a dataframe for each .csv file in the folder within your defined path .下面的函数将返回一个字典，其中包含您定义路径中文件夹中每个 .csv 文件的数据框。

import pandas as pd
import glob
import os
import ntpath

def panda_read_csv(path):
    pd_csv_dict = {}
    csv_files = glob.glob(os.path.join(path, "*.csv"))
    for csv_file in csv_files:
        file_name = ntpath.basename(csv_file)
        pd_csv_dict['pd_' + file_name] = pd.read_csv(csv_file, sep=";", encoding='mac_roman')
    locals().update(pd_csv_dict)
    return pd_csv_dict

Answer 7

Another answer using list comprehension:使用列表理解的另一个答案：

from os import listdir
files= [f for f in listdir("./") if f.endswith(".csv")]

Answer 8

If you want to import your files as separate dataframes, you can try this:如果要将文件作为单独的数据帧导入，可以尝试以下操作：

import pandas as pd
import os

filenames = os.listdir("../data/") # lists all csv files in your directory

def extract_name_files(text): # removes .csv from the name of each file
    name_file = text.strip('.csv').lower()
    return name_file

names_of_files = list(map(extract_name_files,filenames)) # creates a list that will be used to name your dataframes

for i in range(0,len(names_of_files)): # saves each csv in a dataframe structure
    exec(names_of_files[i] + " =  pd.read_csv('../data/'+filenames[i])")

Answer 9

You can use pathlib glob functionality to list all .csv in a path, and pandas to read them.您可以使用pathlib glob功能列出路径中的所有 .csv，并使用pandas来读取它们。 Then it's only a matter of applying whatever function you want (which, if systematic, can also be done within the list comprehension)然后只需应用您想要的任何功能（如果系统化，也可以在列表理解中完成）

import pands as pd
from pathlib import Path

path2csv = Path("/your/path/")
csvlist = path2csv.glob("*.csv")
csvs = [pd.read_csv(g) for g in csvlist ]

Answer 10

You need to import the glob library and then use it like following:您需要导入 glob 库，然后按如下方式使用它：

import  glob
path='C:\\Users\\Admin\\PycharmProjects\\db_conection_screenshot\\seclectors_absent_images'
filenames = glob.glob(path + "\*.png")
print(len(filenames))

使用 Python 从目录中读取所有 csv 个文件

问题描述

9 个解决方案

解决方案1
37 已采纳

解决方案2
15 2020-01-19 11:21:14

解决方案3
9 2015-11-03 16:34:03

解决方案4
2 2015-11-03 16:35:34

解决方案5
1 2021-07-21 16:48:14

解决方案6
1 2022-03-11 07:35:22

解决方案7
1 2023-01-16 16:35:10

解决方案8
0 2021-12-02 17:19:33

解决方案9
0 2022-05-24 19:37:45

解决方案10
-1 2022-03-31 10:09:24

使用 Python 从目录中读取所有 csv 个文件

问题描述

9 个解决方案

解决方案1 37 已采纳

解决方案2 15 2020-01-19 11:21:14

解决方案3 9 2015-11-03 16:34:03

解决方案4 2 2015-11-03 16:35:34

解决方案5 1 2021-07-21 16:48:14

解决方案6 1 2022-03-11 07:35:22

解决方案7 1 2023-01-16 16:35:10

解决方案8 0 2021-12-02 17:19:33

解决方案9 0 2022-05-24 19:37:45

解决方案10 -1 2022-03-31 10:09:24

解决方案1
37 已采纳

解决方案2
15 2020-01-19 11:21:14

解决方案3
9 2015-11-03 16:34:03

解决方案4
2 2015-11-03 16:35:34

解决方案5
1 2021-07-21 16:48:14

解决方案6
1 2022-03-11 07:35:22

解决方案7
1 2023-01-16 16:35:10

解决方案8
0 2021-12-02 17:19:33

解决方案9
0 2022-05-24 19:37:45

解决方案10
-1 2022-03-31 10:09:24