简体   繁体   中英

Python: Convert all files in directory into one .TXT?

I have been trying to convert a number of DOCX files into TXT.

It works for a single file using the code below:

import docx    
def getText(filename):
    doc = docx.Document(filename)
    fullText = []
    for para in doc.paragraphs:
        fullText.append(para.text)
    return '\n'.join(fullText)

if __name__ == '__main__':
    filename='/content/drive/My Drive/path/file.DOCX';  #file name
    fullText=getText(filename)
    print (fullText)

file = open("copy.txt", "w") 
file.write(fullText) 
file.close()

I tried different options (ie glob) but did not manage get it to do the above operation on all files in a folder.

Ideally the output should be 1 large text file and not separate ones. I will need to do some formatting and assigning of IDs in that file in a next step.

Thank you for your help! corp-alt

With file = open("copy.txt", "w") you open the file and replace its content with write() .

With file = open("copy.txt", "a") you append to the existing file with write() . Or maybe even better:

With file = open("copy.txt", "a+") you append to an existing file with write() , or create a new file if it doesn't exist yet.

To go through all files in a folder you can loop over them:

import os
import docx    

def getText(filename):
    doc = docx.Document(filename)
    fullText = []
    for para in doc.paragraphs:
        fullText.append(para.text)
    return '\n'.join(fullText)

if __name__ == '__main__':
    foldername='/content/drive/My Drive/path/';  #folder name
    all_files = os.listdir(foldername) #get all filenames
    docx_files =  [ filename for filename in all_files if filename.endswith('.docx') ] #get  .docx filenames

    file = open("copy.txt", "a+") 
    for docx_file in docx_files: #loop over .docx files
       fullText=getText(filename)
       file.write(fullText) 

    file.close()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM