[英]Is there a way to load data from all files in a directory using Python?
My question: Is there a way to load data from all files in a directory using Python 我的问题:有没有一种方法可以使用Python从目录中的所有文件加载数据
Input: Get all files in a given directory of mine (wow.txt, testting.txt,etc.) 输入:获取我给定目录中的所有文件(wow.txt,testting.txt等)
Process: I want to run all the files through a def function 过程:我想通过def函数运行所有文件
Output: I want the output to be all the files names and their respective content below it.For example: 输出:我希望输出是所有文件名及其下面的内容,例如:
/home/file/wow.txt "all of its content" /home/file/www.txt "all of its content" /home/file/wow.txt“所有内容” /home/file/www.txt“所有内容”
Here is my code: 这是我的代码:
# Import Functions
import os
import sys
# Define the file path
path="/home/my_files"
file_name="wow.txt"
#Load Data Function
def load_data(path,file_name):
"""
Input : path and file_name
Purpose: loading text file
Output : list of paragraphs/documents and
title(initial 100 words considered as title of document)
"""
documents_list = []
titles=[]
with open( os.path.join(path, file_name) ,"rt", encoding='latin-1') as fin:
for line in fin.readlines():
text = line.strip()
documents_list.append(text)
print("Total Number of Documents:",len(documents_list))
titles.append( text[0:min(len(text),100)] )
return documents_list,titles
#Output
load_data(path,file_name)
Here is my output: 这是我的输出:
My Problem is that my output only takes one file and shows its content. 我的问题是我的输出仅占用一个文件并显示其内容。 Obviously, i defined the path and file name in my code to one file but I am confused as to how to write the path in a way to load all the files and output each of its contents separately.
显然,我在代码中将路径和文件名定义为一个文件,但是我对如何编写路径以加载所有文件并分别输出其每个内容感到困惑。 Any suggestions?
有什么建议么?
Try this: 尝试这个:
import glob
for file in glob.glob("test/*.xyz"):
print(file)
if my directory name was "test" and I had lots of xyz files in them... 如果我的目录名称是“ test”,并且其中有很多xyz文件...
import glob
files = glob.glob("*.txt") # get all the .txt files
for file in files: # iterate over the list of files
with open(file, "r") as fin: # open the file
# rest of the code
Using os.listdir()
: 使用
os.listdir()
:
import os
arr = os.listdir()
files = [x for x in arr if x.endswith('.txt')]
for file in files: # iterate over the list of files
with open(file, "r") as fin: # open the file
# rest of the code
You can use glob
and pandas 您可以使用
glob
和Pandas
import pandas as pd import glob 将pd导入为pd导入glob
path = r'some_directory' # use your path
all_files = glob.glob(path + "/*.txt")
li = []
for filename in all_files:
#read file here
# if you decide to use pandas you might need to use the 'sep' paramaeter as well
df = pd.read_csv(filename, index_col=None, header=0)
li.append(df)
# get it all together
frame = pd.concat(li, axis=0, ignore_index=True)
I will take advantage of the function you have already written, so use the following: 我将利用您已经编写的功能,因此请使用以下代码:
data = []
path="/home/my_files"
dirs = os.listdir( path )
for file in dirs:
data.append(load_data(path, file))
In this case you will have all data in the list data
. 在这种情况下,您将具有列表
data
中的所有data
。
Hi you can use a for loop on a listdir: 嗨,您可以在listdir上使用for循环:
os.listdir(<path of your directory>)
this gives you the list of files in your directory, but this gives you also the name of folders in that directory 这将为您提供目录中的文件列表,但同时还会为您提供该目录中的文件夹名称
Try generating a file list first, then passing that to a modified version of your function. 尝试先生成文件列表,然后将其传递给函数的修改版本。
def dir_recursive(dirName):
import os
import re
fileList = list()
for (dir, _, files) in os.walk(dirName):
for f in files:
path = os.path.join(dir, f)
if os.path.exists(path):
fileList.append(path)
fList = list()
prog = re.compile('.txt$')
for k in range(len(fileList)):
binMatch = prog.search(fileList[k])
if binMatch:
fList.append(binMatch.string)
return fList
def load_data2(file_list):
documents_list = []
titles=[]
for file_path in file_list:
with open( file_path ,"rt", encoding='latin-1') as fin:
for line in fin.readlines():
text = line.strip()
documents_list.append(text)
print("Total Number of Documents:",len(documents_list))
titles.append( text[0:min(len(text),100)] )
return documents_list,titles
# Generate a file list & load the data from it
file_list = dir_recursive(path)
documents_list, titles = load_data2(file_list)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.