从 pdf 中搜索多个单词

Question

I'm trying to write a Python Script which will Find specific words in pdf files.我正在尝试编写一个 Python 脚本，它将在 pdf 文件中查找特定单词。 Right now I have to scroll through the result to find the lines where its found.现在我必须滚动结果以找到找到它的行。

I want the lines containing the word alone to be printed or saved as a separate file.我希望单独打印包含单词的行或将其保存为单独的文件。

# import packages
import PyPDF2
import re

# open the pdf file
object = PyPDF2.PdfFileReader("Filename.pdf")

# get number of pages
NumPages = object.getNumPages()

# define keyterms
Strings = "House|Property|street"

# extract text and do the search
for i in range(0, NumPages):
    PageObj = object.getPage(i)
    print("this is page " + str(i)) 
    Text = PageObj.extractText() 
    # print(Text)
    ResSearch = re.search(Strings, Text)
    print(ResSearch)

When I run the above code I need to scroll through the output to find the lines where the words are found.当我运行上面的代码时，我需要滚动浏览 output 以找到找到单词的行。 I expect the lines containing the words to be printed or saved as separate file or the page containing the line alone to be saved in separate pdf or txt file.我希望将包含单词的行打印或保存为单独的文件，或者将仅包含该行的页面保存在单独的 pdf 或 txt 文件中。 Thanks for the help in advance我在这里先向您的帮助表示感谢

Answer 1

You could use re.match after splitting lines for the text on each page.您可以在为每页上的文本拆分行后使用re.match 。

As an example:举个例子：

for i in range(0, num_pages):
    page = object.getPage(i)
    text = page.extractText()
    for line in text.splitlines():
        if re.match('House|Property|street', line):
            print(line)

从 pdf 中搜索多个单词

问题描述

1 个解决方案

解决方案1
0 2019-11-04 08:34:52

从 pdf 中搜索多个单词

问题描述

1 个解决方案

解决方案1 0 2019-11-04 08:34:52

解决方案1
0 2019-11-04 08:34:52