简体   繁体   English

突出显示 docx 文件中的单词并显示它们出现在哪一行以及每行出现的次数,在 python 中读取 2 个 docx

[英]Highlight words in a docx file and display in which line they appear and how many times in each, read 2 docx in python

My python program should read 2 files one file with the words to search and another file which is were the program should look up for the words.我的 python 程序应该读取 2 个文件,一个文件包含要搜索的单词,另一个文件是程序应该查找的单词。

I already found how to open one file, but cannot make that the program look up for the words from the 1st file我已经找到了如何打开一个文件,但无法让程序从第一个文件中查找单词

import sys

print('\n\n')
print('Name of file is be analyse ?')
infile_name = input()

#  Opening files for READ in Python.
infile_name = './archivos/poemas.txt'
print('\nName of file is be analyse : ', infile_name)
infile  = open(infile_name, 'r');

#  Opening files for WRITE in Python.
outfile_name = infile_name[:-4] + '_review.html'
print('Results in file:             ', outfile_name)
outfile = open(outfile_name, 'w');


linea_out = "<HTML>\n<HEAD>\n<TITLE> Final Project </TITLE>\n</HEAD>\n\n"
outfile.write(linea_out)
linea_out = "<BODY  BGCOLOR=\"#FFFFFF\"  
BACKGROUND=\"./images/ITESMwatermark.png\">\n<BR><P>\n<BR><P>\n\n\n\n\n"
outfile.write(linea_out)
outfile.writelines(['<HTML>\n<HEAD>\n<TITLE> Final Project 
</TITLE>\n</HEAD>\n\n',
                '<BODY  BGCOLOR=\"#FFFFFF\"  
BACKGROUND=\"./images/ITESMwatermark.png\">\n<BR><P>\n<BR><P>\n\n\n\n\n'])


i = 1
with infile:
    for linea in infile:
        if len(str(i)) == 1:
            dm1 = '&nbsp; &nbsp; &nbsp;'
        elif len(str(i)) == 2:
            dm1 = '&nbsp; &nbsp;'
        elif len(str(i)) == 3:
            dm1 = '&nbsp;'


        if  i % 8 == 0:
            linea = '<b>' + linea + '</b>'
        elif i % 4 == 0:
            linea = "<span style=\"background-color: #FFFF00\"><b><font         
color=\"red\">" + linea + "</font></b></span>"

        linea_out = 'linea: ' + dm1 + str(i) + '&nbsp; &nbsp; &nbsp; 
&nbsp;' + linea + '<BR>\n';
        outfile.write(linea_out)
        i += 1


linea_out = "\n\n<BR><P></BODY></HTML>"
outfile.write(linea_out)
outfile.close()
infile.close();
print("\n\n")

I expect for it to return another docx file with the highlighted words and in which line and how many times it appears.我希望它返回另一个带有突出显示的单词的 docx 文件,以及它出现在哪一行和出现了多少次。

You first need to open the file with look-up words in addition to the text file for (searching the words) that you have already open.除了已经打开的文本文件(搜索单词)之外,您首先需要打开带有查找单词的文件。 Go through each look up word and each line of the text. Go 通过每个查找单词和文本的每一行。 For each word go through all the lines.对于每个字 go 贯穿所有行。 Store a hashmap with keys as look-up words and in the value list of line numbers.将带有键的 hashmap 存储为查找字并在行号的值列表中。 Now you open output file and write the word from map and line numbers and length of the list of line numbers.现在您打开 output 文件并写入 map 中的单词以及行号和行号列表的长度。 It's not that efficient though.虽然效率不高。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM