简体   繁体   English

如何列出目录中的文件并逐一处理? -Python

[英]how to make list of files in directory and process them one by one? - Python

i want to make a list of all text files in a directory. 我想列出目录中所有文本文件的列表。 then i want to create separate list of the contents in each file. 那么我想在每个文件中创建内容的单独列表。 eg document1=[] and then document2=[] so on. 例如document1 = [],然后document2 = [],依此类推。 and then by using document 1 and document 2 keywords i want to calculate term frequency and other processes. 然后通过使用文档1和文档2关键字来计算词频和其他过程。 code is running but list cant be assigned different names as document1 and so on. 代码正在运行,但无法为列表分配不同的名称,如document1,依此类推。

import glob
import math
import re

a=0
flist=glob.glob(r'D:/Final Year Project/Development process/Text_data_extraction/MyFolder/*.txt') #get all the files from the d`#open each file >> tokenize the content >> and store it in a set
for fname in flist:         
    tfile=open(fname,"r")
    line=tfile.read()
    a+=1
    line = line.lower() # lowercase
    line = re.sub("</?.*?>"," <> ",line) #remove tags
    line = re.sub("(\\d|\\W)+"," ",line)  # remove special characters and digits
    l_ist = line.split("\n")
    print 'document'
    print(l_ist)
tfile.close() # close the file
print"Number of documents:"
print(a)

You can assign the list you create in each iteration to a dict indexed by the file name: 您可以将在每次迭代中创建的列表分配给由文件名索引的字典:

import glob
import math
import re

flist=glob.glob(r'D:/Final Year Project/Development process/Text_data_extraction/MyFolder/*.txt') #get all the files from the d`#open each file >> tokenize the content >> and store it in a set
content = {}
for fname in flist:         
    tfile=open(fname,"r")
    line=tfile.read()
    line = line.lower() # lowercase
    line = re.sub("</?.*?>"," <> ",line) #remove tags
    line = re.sub("(\\d|\\W)+"," ",line)  # remove special characters and digits
    l_ist = line.split("\n")
    print 'document'
    print(l_ist)
    content[fname] = l_lst
tfile.close() # close the file
print("Number of documents:")
print(len(content))
print(content) # to verify the content of the entire dict

转到这里 ,我相信不要仅仅给出文本文件名,而要给出目录路径以及名称结构,对于“ document1,document2 ...”,请使用循环,或者如果设置了文件文件数,请使用它们。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如果只需要一个文件,如何使 python 脚本处理所有文件? - how to make python script process all files if it takes only one? 如何一一读取所有xml文件并一一处理它们 - How to read all xml files one by one and process them one by one 如何使用Python将文件从一个子文件夹处理到每个目录中的另一个子文件夹? - How to process files from one subfolder to another in each directory using Python? Python脚本读取一个目录中的多个excel文件并将它们转换为另一个目录中的.csv文件 - Python script to read multiple excel files in one directory and convert them to .csv files in another directory 如果我有一个CSV文件的Python列表,如何将它们全部合并为一个巨型CSV文件? - If I have a Python list of CSV files, how do I merge them all into one giant CSV file? Python一一转换目录中的文件 - Python convert files in directory one by one 如何将2D列表python转换为一个列表并将它们连接起来 - How to convert 2D list python into one list with concate them 从一个目录加载所有csv / txt文件并通过python合并它们 - Load all csv/txt files from one directory and merge them via python 从目录中提取所有音频文件,然后将它们放到新文件中。 蟒蛇 - Extract all audio files from a directory and put them to a new one | python 如何在一个目录中列出文件列表,其中该目录中的文件首先位于 python 中子目录中的文件之前? - How can I make a list of files in a directory where the files in that directory are first before files in subdirectories in python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM