简体   繁体   English

Python:将目录中的所有文件转换为一个 .TXT?

[英]Python: Convert all files in directory into one .TXT?

I have been trying to convert a number of DOCX files into TXT.我一直在尝试将一些 DOCX 文件转换为 TXT。

It works for a single file using the code below:它适用于使用以下代码的单个文件:

import docx    
def getText(filename):
    doc = docx.Document(filename)
    fullText = []
    for para in doc.paragraphs:
        fullText.append(para.text)
    return '\n'.join(fullText)

if __name__ == '__main__':
    filename='/content/drive/My Drive/path/file.DOCX';  #file name
    fullText=getText(filename)
    print (fullText)

file = open("copy.txt", "w") 
file.write(fullText) 
file.close()

I tried different options (ie glob) but did not manage get it to do the above operation on all files in a folder.我尝试了不同的选项(即 glob),但没有设法让它对文件夹中的所有文件执行上述操作。

Ideally the output should be 1 large text file and not separate ones.理想情况下,输出应该是 1 个大文本文件,而不是单独的文件。 I will need to do some formatting and assigning of IDs in that file in a next step.在下一步中,我需要对该文件中的 ID 进行一些格式化和分配。

Thank you for your help!感谢您的帮助! corp-alt公司

With file = open("copy.txt", "w") you open the file and replace its content with write() .使用file = open("copy.txt", "w")打开文件并用write()替换其内容。

With file = open("copy.txt", "a") you append to the existing file with write() .使用file = open("copy.txt", "a")您可以使用write()附加到现有文件。 Or maybe even better:或者甚至更好:

With file = open("copy.txt", "a+") you append to an existing file with write() , or create a new file if it doesn't exist yet.使用file = open("copy.txt", "a+")您可以使用write()附加到现有文件,或者如果尚不存在则创建一个新文件。

To go through all files in a folder you can loop over them:要浏览文件夹中的所有文件,您可以遍历它们:

import os
import docx    

def getText(filename):
    doc = docx.Document(filename)
    fullText = []
    for para in doc.paragraphs:
        fullText.append(para.text)
    return '\n'.join(fullText)

if __name__ == '__main__':
    foldername='/content/drive/My Drive/path/';  #folder name
    all_files = os.listdir(foldername) #get all filenames
    docx_files =  [ filename for filename in all_files if filename.endswith('.docx') ] #get  .docx filenames

    file = open("copy.txt", "a+") 
    for docx_file in docx_files: #loop over .docx files
       fullText=getText(filename)
       file.write(fullText) 

    file.close()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM