[英]Python: Convert all files in directory into one .TXT?
I have been trying to convert a number of DOCX files into TXT.我一直在尝试将一些 DOCX 文件转换为 TXT。
It works for a single file using the code below:它适用于使用以下代码的单个文件:
import docx
def getText(filename):
doc = docx.Document(filename)
fullText = []
for para in doc.paragraphs:
fullText.append(para.text)
return '\n'.join(fullText)
if __name__ == '__main__':
filename='/content/drive/My Drive/path/file.DOCX'; #file name
fullText=getText(filename)
print (fullText)
file = open("copy.txt", "w")
file.write(fullText)
file.close()
I tried different options (ie glob) but did not manage get it to do the above operation on all files in a folder.我尝试了不同的选项(即 glob),但没有设法让它对文件夹中的所有文件执行上述操作。
Ideally the output should be 1 large text file and not separate ones.理想情况下,输出应该是 1 个大文本文件,而不是单独的文件。 I will need to do some formatting and assigning of IDs in that file in a next step.
在下一步中,我需要对该文件中的 ID 进行一些格式化和分配。
Thank you for your help!感谢您的帮助! corp-alt
公司
With file = open("copy.txt", "w")
you open the file and replace its content with write()
.使用
file = open("copy.txt", "w")
打开文件并用write()
替换其内容。
With file = open("copy.txt", "a")
you append to the existing file with write()
.使用
file = open("copy.txt", "a")
您可以使用write()
附加到现有文件。 Or maybe even better:或者甚至更好:
With file = open("copy.txt", "a+")
you append to an existing file with write()
, or create a new file if it doesn't exist yet.使用
file = open("copy.txt", "a+")
您可以使用write()
附加到现有文件,或者如果尚不存在则创建一个新文件。
To go through all files in a folder you can loop over them:要浏览文件夹中的所有文件,您可以遍历它们:
import os
import docx
def getText(filename):
doc = docx.Document(filename)
fullText = []
for para in doc.paragraphs:
fullText.append(para.text)
return '\n'.join(fullText)
if __name__ == '__main__':
foldername='/content/drive/My Drive/path/'; #folder name
all_files = os.listdir(foldername) #get all filenames
docx_files = [ filename for filename in all_files if filename.endswith('.docx') ] #get .docx filenames
file = open("copy.txt", "a+")
for docx_file in docx_files: #loop over .docx files
fullText=getText(filename)
file.write(fullText)
file.close()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.