[英]how to make list of files in directory and process them one by one? - Python
i want to make a list of all text files in a directory. 我想列出目录中所有文本文件的列表。 then i want to create separate list of the contents in each file.
那么我想在每个文件中创建内容的单独列表。 eg document1=[] and then document2=[] so on.
例如document1 = [],然后document2 = [],依此类推。 and then by using document 1 and document 2 keywords i want to calculate term frequency and other processes.
然后通过使用文档1和文档2关键字来计算词频和其他过程。 code is running but list cant be assigned different names as document1 and so on.
代码正在运行,但无法为列表分配不同的名称,如document1,依此类推。
import glob
import math
import re
a=0
flist=glob.glob(r'D:/Final Year Project/Development process/Text_data_extraction/MyFolder/*.txt') #get all the files from the d`#open each file >> tokenize the content >> and store it in a set
for fname in flist:
tfile=open(fname,"r")
line=tfile.read()
a+=1
line = line.lower() # lowercase
line = re.sub("</?.*?>"," <> ",line) #remove tags
line = re.sub("(\\d|\\W)+"," ",line) # remove special characters and digits
l_ist = line.split("\n")
print 'document'
print(l_ist)
tfile.close() # close the file
print"Number of documents:"
print(a)
You can assign the list you create in each iteration to a dict indexed by the file name: 您可以将在每次迭代中创建的列表分配给由文件名索引的字典:
import glob
import math
import re
flist=glob.glob(r'D:/Final Year Project/Development process/Text_data_extraction/MyFolder/*.txt') #get all the files from the d`#open each file >> tokenize the content >> and store it in a set
content = {}
for fname in flist:
tfile=open(fname,"r")
line=tfile.read()
line = line.lower() # lowercase
line = re.sub("</?.*?>"," <> ",line) #remove tags
line = re.sub("(\\d|\\W)+"," ",line) # remove special characters and digits
l_ist = line.split("\n")
print 'document'
print(l_ist)
content[fname] = l_lst
tfile.close() # close the file
print("Number of documents:")
print(len(content))
print(content) # to verify the content of the entire dict
转到这里 ,我相信不要仅仅给出文本文件名,而要给出目录路径以及名称结构,对于“ document1,document2 ...”,请使用循环,或者如果设置了文件文件数,请使用它们。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.