Python-将csv文件复制到Dataframe（但跳过子文件夹）

Question

I am using the below code to read a set of csv files from a folder to a Dataframe. 我正在使用下面的代码从文件夹到数据框读取一组csv文件。 However this folder has a sub-folder along with these csv files. 但是，此文件夹与这些csv文件一起具有一个子文件夹。 How could I skip the sub-folder and only read the csv file. 我如何跳过子文件夹，而只读取csv文件。 The below code throws an error when I try to run this folder that has a sub-folder. 当我尝试运行具有子文件夹的文件夹时，以下代码引发错误。

import pandas as pd
import glob
import numpy as np
import os
import datetime
import time

path = r'/Users/user/desktop/Sales/'


allFiles = glob.glob(path + "/*.csv")
frame = pd.DataFrame()
list_ = []
for file_ in allFiles:
    df = pd.read_csv(file_,index_col=None, header=0)
    list_.append(df)
sale_df = pd.concat(list_)
sale_df

Error message : IsADirectoryError: [Errno 21] Is a directory: 
'/Users/user/desktop/Sales/2018-05-03/20180503000513-kevin@store.com- 
190982.csv-1525305907670.csv'

Could anyone assist on this. 谁能在这方面提供协助。 Thanks 谢谢

EDIT: The issue is the subdirectory has the extension '.csv' present in the subdirectory name. 编辑：问题是子目录的子目录名称中存在扩展名“ .csv”。

EDIT in code 在代码中编辑

path =r'/Users/user/desktop/Sales/2018-05-03/'
files_only = [file for file in 
glob.glob('/Users/user/desktop/Sales/2018-05-03/*.csv') if not 
os.path.isdir(file)]
frame = pd.DataFrame()
list_ = []
for file_ in allFiles:
    df = pd.read_csv(files_only,index_col=None, header=0)
    list_.append(df)
sale_df = pd.concat(list_)
sale_df['filename'] = os.path.basename(csv)
sale_df.append(frame)
sale_df

Get the below error 得到以下错误

ValueError: No objects to concatenate ValueError：没有要串联的对象

Could you please assist. 你能帮忙吗？ Thanks.. 谢谢..

Answer 1

My suggestion uses glob.glob to get a list of all matching files/directories that match the specified string, then uses the os module to check each matching file/directory to make sure it is a file. 我的建议是使用glob.glob获取与指定字符串匹配的所有匹配文件/目录的列表，然后使用os模块检查每个匹配的文件/目录以确保它是一个文件。 It returns a list of ONLY files that match the glob.glob(). 它返回与glob.glob（）匹配的仅文件列表。

import glob
import os

files_only = [file for file in glob.glob('/path/to/files/*.ext') if not os.path.isdir(file)]

You can then use the files_only list in your read_csv loop. 然后，您可以在read_csv循环中使用files_only列表。

So in your code: 因此，在您的代码中：

files_only = [file for file in glob.glob('/Users/user/desktop/Sales/2018-05-03/*.csv') if not os.path.isdir(file)]
frame = pd.DataFrame()
list_ = []
for file in files_only:
    df = pd.read_csv(file,index_col=None, header=0)
    list_.append(df)
sale_df = pd.concat(list_)
sale_df['filename'] = os.path.basename(csv)
sale_df.append(frame)
sale_df

Answer 2

You call allFiles = glob.glob(path + "/*.csv") , even when your path variable ends with a forward slash. 即使path变量以正斜杠结尾，也可以调用allFiles = glob.glob(path + "/*.csv") 。 That way, it ends up as allFiles = glob.glob("/Users/user/desktop/Sales//*.csv") . 这样，最终结果为allFiles = glob.glob("/Users/user/desktop/Sales//*.csv") 。
See if fixing that helps with your error. 查看是否可以解决错误的修复方法。

Python-将csv文件复制到Dataframe（但跳过子文件夹）

问题描述

2 个解决方案

解决方案1
2 已采纳 2018-05-03 11:12:48

解决方案2
0 2018-05-03 11:29:06

Python-将csv文件复制到Dataframe（但跳过子文件夹）

问题描述

2 个解决方案

解决方案1 2 已采纳 2018-05-03 11:12:48

解决方案2 0 2018-05-03 11:29:06

解决方案1
2 已采纳 2018-05-03 11:12:48

解决方案2
0 2018-05-03 11:29:06