如何列出目录中的文件并逐一处理？ -Python

Question

i want to make a list of all text files in a directory. 我想列出目录中所有文本文件的列表。 then i want to create separate list of the contents in each file. 那么我想在每个文件中创建内容的单独列表。 eg document1=[] and then document2=[] so on. 例如document1 = []，然后document2 = []，依此类推。 and then by using document 1 and document 2 keywords i want to calculate term frequency and other processes. 然后通过使用文档1和文档2关键字来计算词频和其他过程。 code is running but list cant be assigned different names as document1 and so on. 代码正在运行，但无法为列表分配不同的名称，如document1，依此类推。

import glob
import math
import re

a=0
flist=glob.glob(r'D:/Final Year Project/Development process/Text_data_extraction/MyFolder/*.txt') #get all the files from the d`#open each file >> tokenize the content >> and store it in a set
for fname in flist:         
    tfile=open(fname,"r")
    line=tfile.read()
    a+=1
    line = line.lower() # lowercase
    line = re.sub("&lt;/?.*?&gt;"," &lt;&gt; ",line) #remove tags
    line = re.sub("(\\d|\\W)+"," ",line)  # remove special characters and digits
    l_ist = line.split("\n")
    print 'document'
    print(l_ist)
tfile.close() # close the file
print"Number of documents:"
print(a)

Answer 1

You can assign the list you create in each iteration to a dict indexed by the file name: 您可以将在每次迭代中创建的列表分配给由文件名索引的字典：

import glob
import math
import re

flist=glob.glob(r'D:/Final Year Project/Development process/Text_data_extraction/MyFolder/*.txt') #get all the files from the d`#open each file >> tokenize the content >> and store it in a set
content = {}
for fname in flist:         
    tfile=open(fname,"r")
    line=tfile.read()
    line = line.lower() # lowercase
    line = re.sub("&lt;/?.*?&gt;"," &lt;&gt; ",line) #remove tags
    line = re.sub("(\\d|\\W)+"," ",line)  # remove special characters and digits
    l_ist = line.split("\n")
    print 'document'
    print(l_ist)
    content[fname] = l_lst
tfile.close() # close the file
print("Number of documents:")
print(len(content))
print(content) # to verify the content of the entire dict

Answer 2

转到这里，我相信不要仅仅给出文本文件名，而要给出目录路径以及名称结构，对于“ document1，document2 ...”，请使用循环，或者如果设置了文件文件数，请使用它们。

如何列出目录中的文件并逐一处理？ -Python

问题描述

2 个解决方案

解决方案1
0 2018-09-20 07:37:14

解决方案2
0 2018-09-20 07:39:40

如何列出目录中的文件并逐一处理？ -Python

问题描述

2 个解决方案

解决方案1 0 2018-09-20 07:37:14

解决方案2 0 2018-09-20 07:39:40

解决方案1
0 2018-09-20 07:37:14

解决方案2
0 2018-09-20 07:39:40