I am trying to extract text from a PDF file, below is my code:
file_path = "xxx.pdf"
pdfFileObj = open(file_path, 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
print(pdfReader.numPages)
pageObj = pdfReader.getPage(0)
print(pageObj.extractText())
pdfFileObj.close()
After running the code, I kept getting the error message:
---> print(pageObj.extractText())
AttributeError: 'NameObject' object has no attribute 'get_data'
The output is like:
68
Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
c:\HL\PDF parsing\pdfparsing pdfminer.ipynb Cell 6 in <cell line: 20>()
17 pageObj = pdfReader.getPage(11)
19 # extracting text from page
---> 20 print(pageObj.extractText())
22 # closing the pdf file object
23 pdfFileObj.close()
File c:\Users\Hlin\Anaconda3\lib\site-packages\PyPDF2\_page.py:1545, in PageObject.extractText(self, Tj_sep, TJ_sep)
1539 """
1540 .. deprecated:: 1.28.0
1541
1542 Use :meth:`extract_text` instead.
1543 """
1544 deprecate_with_replacement("extractText", "extract_text")
-> 1545 return self.extract_text()
File c:\Users\Hlin\Anaconda3\lib\site-packages\PyPDF2\_page.py:1517, in PageObject.extract_text(self, Tj_sep, TJ_sep, orientations, space_width, *args)
1514 if isinstance(orientations, int):
1515 orientations = (orientations,)
-> 1517 return self._extract_text(
1518 self, self.pdf, orientations, space_width, PG.CONTENTS
1519 )
...
(...)
205 .replace(b">>", b"\n}\n") # some solution to find it back
206 )
AttributeError: 'NameObject' object has no attribute 'get_data'
And I couldn't find any similar error, and the weird thing is that the code went well with other files but not with this specific one.
Does anyone have any idea what could happen with my code or the PDF file?
Try reinstalling the PyPDF2 library.
pip uninstall pypdf2
pip install pypdf2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.