Tag[pdfplumber] Recent Newest Questions

Python - Reset BytesIO So Next File Isn't Appended

I'm having a problem with BytesIO library in Python. I want to convert a pdf file that I have retrieved from an S3 bucket, and convert it into a dataf ...

pdfplumber extract table data works when the table has borders, doesn't work when the table has no borders

Using reportlab I made 2 1 page pdfs with 1 table: The data in the table is this: The point is, to get the rows including the empty cells. If the ...

extracting data into columns using pdfplumber

I have a pdf which has data in tabular format and has 6 columns but the columns are not separated by boundaries so when I extract the data using pdfpl ...

How to Convert PDF file into CSV file using Python Pandas

I have a PDF file, I need to convert it into a CSV file this is my pdf file example as link https://online.flippingbook.com/view/352975479/ the code u ...

PYTHON - extract list element using keyword

My goal is to extract an element from many list that similar like this. Taking elements that is food. I the final result would be "Sandwich" by loo ...

how to recognize a graph in pdf using python?

new to pdf parsing. I want to recognize a graph in a pdf file, so I could skip it and not extract this type of text. all I know about the pdf is that ...

How to solve (cid:x) pdfplumber python text extraction

PDF_Doc I've been working with the pdfplumber library to extract text from pdf documents and it's been fine, however in the documents I'm working on ...

Is there a way in python to extract only the CORE TEXT (without boxes, footer etc.) from a pdf?

I am trying to extract only the core text from a "rich" pdf document, meaning that it has a lot of tables, graphs, boxes, footers etc. in which I am n ...

how to extract only main text with pdfplumber and ignore image text and tables?

trying to parse any non scanned pdf and extract only text, without tables and their comments or pictures and their comment. just the main text of a pd ...

Issue with ligatures when converting PDF to text in Python (pdfplumber)

I am running into an issue when trying to convert a PDF to text where the ligatures 'fi' 'ff' 'fl' are being converted to an empty space. I have read ...

How to extract radiobutton / checkbox information with python from a pdf-file?

i would like to get the radio-button / checkbox information from a pdf-document - I had a look at pdfplumber and pypdf2 - but was not able to find a s ...

How do you get the filename from a `pdfplumber.pdf.PDF`?

I have a function that is passed a pdfplumber.pdf.PDF argument and I need to reference the filename of the PDF. Is there any way to get the filename f ...

Is there a way to extract sentences after bold text in ptyhon?

so i have extracted some bold text from a pdf in python. Which works fine. but i want to extract also the sentence, or more then one sentence after th ...

pdfplumber to_image() OSError: exception: access violation writing 0x0000000000000008 in Windows 10

I was trying to use pdfplumber library in python (ver. 3.10.6) to convert some pdf pages to images but pdfplumber to_image() method throws the followi ...

How to filter text within a certain area using pdfPlumber and open CV?

I've got a bunch of pdf files which are from conference proceedings. Every pdf file's structure looks like: I used pdfPlumber to choose the ch ...

When running pdfplumber in python I got an error --> CryptographyDeprecationWarning: Python 3.6 is no longer supported by the Python core team

I'm using a Python script that extracts the text content of a PDF file using pdfplumber. When running pdfplumber in python I got an error like this ...

extracting images from PDF with page and screen coordinate information

I want to extract images from PDFs retaining a knowledge of their content (page_number and coordinates on page). (Some tools (e.g. pdfminer) only emit ...

extract the specific text from pdfs using python

I have tried different python libraries to extract the specific text from pdfs, I have to extract text under the heading pdf1 from this pdf, I have to ...

How to extract texts and tables pdfplumber

With the pdfplumber library, you can extract the text of a PDF page, or you can extract the tables from a pdf page. The issue is that I can't seem to ...

how to do complex pdf extraction with regex

I have a PDF file which contains Lottery Tickets winners, i want to extract all win tickets according to their prizes. PDF file i tried this: and ...