Python - Slate3k 在安装 pdfminer 后给我一个类型错误

Question

I'm on Python 3.8.3 on windows 10.我在 windows 10 上的Python 3.8.3上。

I am working on a pdfparser and I initially found slate3k to use with Python 3.X .我正在研究 pdfparser，我最初发现slate3k可以与Python 3.X一起使用。 I got a basic script working and started to test it on some PDFs.我得到了一个基本脚本，并开始在一些 PDF 上对其进行测试。 I had some issues with some text not being parsed properly so I started to look into PDFMiner .我遇到了一些文本没有被正确解析的问题，所以我开始研究PDFMiner 。

After reading through the documentation for PDFMiner , I decided to install that a give it a go as there was some functionality from it that would be super useful for my use case.在阅读了PDFMiner的文档后，我决定安装它并给它一个 go 因为它有一些对我的用例非常有用的功能。

However, I figured out soon after that PDFMiner doesn't work with Python 3.x .但是，我很快就发现PDFMiner不适用于Python 3.x 。 I uninstalled it and went back to using slate3k .我将其卸载并重新使用slate3k 。

When I did this, I started to get a bunch of errors.当我这样做时，我开始遇到一堆错误。 I then uninstalled slate3k and re-installed hoping to fix it.然后我卸载slate3k并重新安装，希望能修复它。 Still got the errors.仍然有错误。 I re-installed PDFMiner and get rid of those errors but now I stuck with the below error and I'm at a loss for what to do next.我重新安装了PDFMiner并摆脱了这些错误，但现在我遇到了以下错误，我不知道下一步该做什么。

Exception has occurred: TypeError __init__() missing 1 required positional argument: 'parser'

Here is the code (please note I haven't done much error trapping and it's still a work in progress, I'm more at the "proof of concept" stage):这是代码（请注意我没有做太多的错误捕获，它仍在进行中，我更多的是处于“概念验证”阶段）：

import re, os
import slate3k as slate

# variable define
CurWkDir = os.getcwd()
tags= list()
rev= str()
FileName = str()
ProperFileName = str()
parsed = str()

# open file and create if it doesn't exist
xref = open('parsed from pdf xref.csv', 'w+')
xref.write('File Name, Rev, Tag')

for files in os.listdir(CurWkDir):

    # find pdf files
    if files.endswith('.pdf'):

        tags.clear()
        rev = ""
        FileName = ""
        ProperFileName = ""

        #extract revision, file name, create proper file name
        rev = re.findall(r'[0-9]{,2}[A-Z]{1}[0-9]{,2}',files)[0]
        FileName = re.findall(r'[A-Z]+[0-9]+-[A-Z]+-[0-9]+-[0-9]+|[A-Z]+[0-9]+-[A-Z]+-[A-Z]+[0-9]+-[0-9]+|[A-Z]+[0-9]+-[A-Z]+-[A-Z]+[0-9]+[A-Z]+-[0-9]+', files)[0]
        ProperFileName = FileName + "(" + rev[0: len(rev) - 1] + ")"

        # Parse through PDF to find tags
        fileopen = open(files, 'rb')
        print("Reading", files)
        raw = slate.PDF(fileopen)
        print("Finished reading", files)
        parsed = raw[0]
        parsedstripped = parsed.replace("\n"," ")
        rawtags = re.findall(r'[0-9]+[A-Z]+-[0-9]+|[0-9]+[A-Z]+[0-9]{1,5}|[0-9]{3}[A-Z]+[0-9]+', parsed, re.I)
        fileopen.close
        print(parsedstripped)

        for t in rawtags:

            if t not in tags:

                row = ProperFileName + "," + rev + "," + t + "\n"
                xref.write(row)
                tags.append(t)

xref.close()

The error comes at Line 34 raw = slate.PDF(fileopen)错误出现在第 34 行raw = slate.PDF(fileopen)

Any insight into what I did to break the functionality of slate3k is appreciated.对我为破坏slate3k的功能所做的任何见解表示赞赏。

Thanks,谢谢，

JT JT

Answer 1

I looked into the dependencies on slate3k by looking at pip show slate3k and I found a couple of programs it was dependent on.我通过查看pip show slate3k查看了对slate3k的依赖关系，我发现了它依赖的几个程序。

I uninstalled slate3k , pdfminer3k and pdfminer and then re-installed slate3k .我卸载slate3k 、 pdfminer3k和pdfminer ，然后重新安装了slate3k 。

Now everything seems to be working.现在一切似乎都在工作。

Python - Slate3k 在安装 pdfminer 后给我一个类型错误

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-06-17 20:31:32

Python - Slate3k 在安装 pdfminer 后给我一个类型错误

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-06-17 20:31:32

解决方案1
1 已采纳 2020-06-17 20:31:32