簡體   English   中英

pdfplumber 給出 fp.seek(pos) AttributeError: 'dict' object has no attribute 'seek'

[英]pdfplumber gives fp.seek(pos) AttributeError: 'dict' object has no attribute 'seek'

所以這是我的代碼:

def main():    
    import combinedparser as cp
    from tkinter.filedialog import askopenfilenames

    files = askopenfilenames()
    print(files) #this gives the right files as a list of strings composed of path+filename


    def file_discriminator(func):
        def wrapper():
            results = []
            for item in files:
                if item.endswith('.pdf'):
                    print(item + 'is pdf')
                    func = f1(file = item)
                    results.append(item, Specimen_Output)
                else:
                    print(item + 'is text')
                    func = f2(file = item)
                    results.append(item, Specimen_Output)

        return wrapper


    @file_discriminator
    def parse_me(**functions):
        print(results)


    parse_me(f1 = cp.advparser(), f2 = cp.vikparser())

主要的()

其中 combinedparser.py 有兩個函數:

def advparser(**file):
    import pdfplumber
    with pdfplumber.open(file) as pdf:  # opened fname and assigned it to the variable pdf
        page = pdf.pages[0]  # assigned index 0 of pages to the variable page
        text = page.extract_words()
    #followed by a series of python operations generating a dict named Specimen_Output
def vikparser(**file):
    with open(file, mode = 'r') as filename:
        Specimen_Output = {}
    #followed by a series of python operations generating a dict named Specimen_Output 

我有一個包含 pdf 和隨機散布的文本文件的目錄。 我正在嘗試使用裝飾器 @file_discriminator 運行 function advparser,它使用 pdfplumber 和后續處理從目錄中的 pdf 文件中的 pdf 文件中提取可用信息; 和 vikparser 對文本文件執行常規文本文件處理。 每個都應該生成一個名為 Specimen_Output 的字典。 當 advparser 是一個單獨的 .py 文件作為 advparser(file) 運行時,我得到了正確的結果,導入了 askopenfilename 而不是它的復數,並調用了 advparser(file = askopenfilename()); vikparser 同樣如此(它正在查看帶有 readlines 的文本文件)。 但是當我嘗試從主模塊執行此操作並使用父 function 調用它們時,我無法讓它工作。 我已經嘗試了幾乎所有可能的排列組合,以及對“文件”使用位置與關鍵字 arguments。

當我修復因改變周圍的事物而產生的任何錯誤時,這是我遇到的最常見錯誤:

Traceback (most recent call last):


 File "<input>", line 1, in <module>
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/zachthomasadmin/PycharmProjects/pythonProject1/main.py", line 29, in <module>
    parse_me(f1 = cp.advparser(), f2 = cp.vikparser())
  File "/Users/zachthomasadmin/PycharmProjects/pythonProject1/combinedparser.py", line 12, in advparser
    with pdfplumber.open(file) as pdf:  # opened fname and assigned it to the variable pdf
  File "/Users/zachthomasadmin/PycharmProjects/pythonProject1/venv/lib/python3.8/site-packages/pdfplumber/pdf.py", line 48, in open
    return cls(path_or_fp, **kwargs)
  File "/Users/zachthomasadmin/PycharmProjects/pythonProject1/venv/lib/python3.8/site-packages/pdfplumber/pdf.py", line 25, in __init__
    self.doc = PDFDocument(PDFParser(stream), password=password)
  File "/Users/zachthomasadmin/PycharmProjects/pythonProject1/venv/lib/python3.8/site-packages/pdfminer/pdfparser.py", line 39, in __init__
    PSStackParser.__init__(self, fp)
  File "/Users/zachthomasadmin/PycharmProjects/pythonProject1/venv/lib/python3.8/site-packages/pdfminer/psparser.py", line 502, in __init__
    PSBaseParser.__init__(self, fp)
  File "/Users/zachthomasadmin/PycharmProjects/pythonProject1/venv/lib/python3.8/site-packages/pdfminer/psparser.py", line 172, in __init__
    self.seek(0)
  File "/Users/zachthomasadmin/PycharmProjects/pythonProject1/venv/lib/python3.8/site-packages/pdfminer/psparser.py", line 514, in seek
    PSBaseParser.seek(self, pos)
  File "/Users/zachthomasadmin/PycharmProjects/pythonProject1/venv/lib/python3.8/site-packages/pdfminer/psparser.py", line 202, in seek
    self.fp.seek(pos)
AttributeError: 'dict' object has no attribute 'seek'

我究竟做錯了什么? 它在談論什么字典 object,當我嘗試從 askopenfilename() 單獨調用每種類型時,為什么 pdfplumber 沒有這個問題? 我是一名新手編碼員,一整天都在為此煩惱。 謝謝!

問題是您在advparservikparser函數中的file參數實際上是一個名為 arguments 的字典,因為它是用兩個星號定義的。 所以當你這樣調用這些函數時

func = f1(file = item)

您在advparservikparser函數中的file參數實際上等於{"file": "some_filename.pdf"}

您需要解壓您的 arguments:

def vikparser(**file):
    with open(file["file"], mode='r') as filename:
        pass

或者只在 function 定義中使用單個file參數:

def vikparser(file):
    with open(file, mode='r') as filename:
        pass

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM