简体   繁体   中英

Search word in word documents and print out the file name that contains that word?

Hey so I am new to Python and I wanted to make a script that retrieves the file name from a list of docx documents in a large directory if a file contains a certain word inside the word document.

Here is my code below so far

import os
import docx2txt
os.chdir('C:/Users/epicr/Desktop/Python Stuff/LAB FILES')
text= ''
files = []
for file in os.listdir('C:/Users/epicr/Desktop/Python Stuff/LAB FILES'):
    if file.endswith('.docx'):
        files.append(file)
for i in range(len(files)):
        text += docx2txt.process(files[i])
if text == str('VENTILATION RATIO'):
    print (i)

My thought process is to convert all these docx documents to txt files then search the files for the word that contains 'VENTILATION RATIO'. If the word exists in the files, then the file name containing the file will print.

However the output doesn't print out anything. I know for a fact that in at least one of the Word Documents, there is a word: 'VENTILATION RATIO' (and yes, it is case sensitive) in it

There may be a logic issue in your code.

Try this update:

import os
import docx2txt
os.chdir('C:/Users/epicr/Desktop/Python Stuff/LAB FILES')
text= ''
files = []
for file in os.listdir('C:/Users/epicr/Desktop/Python Stuff/LAB FILES'):
    if file.endswith('.docx'):
        files.append(file)
for i in range(len(files)):
    text = docx2txt.process(files[i])  # text for single file
    if 'VENTILATION RATIO' in text:
         print (i, files[i])  # file index and name

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM