Tag[tabula-py] Recent Newest Questions

Cannot read PDF Data into Sheets with Gspread-DataFrame

I want to read data from a PDF I downloaded using Tabula into Google Sheets, and when I transfer the data as it was read into Google Sheets, I get an ...

Extracting tables from PDF using tabula-py fails to properly detect rows

Problem I want to extract a 70-page vocabulary table from a PDF and turn it into a CSV to use in [any vocabulary learning app]. Tabula-py and its rea ...

Gibberish table output in tabula-java for Japanese PDF but works in standalone Tabula

I am trying to extract data from this Japanese PDF using tabula-py (and tabula-java), but the output is gibberish. In both tabula-py and tabula-java, ...

extracting data into columns using pdfplumber

I have a pdf which has data in tabular format and has 6 columns but the columns are not separated by boundaries so when I extract the data using pdfpl ...

How to extract a single row table data from a pdf using python?

I need to extract tabular data from pdfs. Some tables in the pdf comprise of only a single row. I have been trying to extract the data using camelot l ...

extracting all tables using tabula

While reading a pdf file using df = tabula.read_pdf(pdf_file, pages=‘all’) —> displays all tables from all pages. but when converting into a Panda ...

Why my tabula template does not output the data from PDF file when running through Python?

I selected the area using Tabula as below in the app and created a template. The out put in web works. But when I do it via code below I get an error ...

Unable to extract tables from tabula or Camelot

Tried to extract the below table using Tabula, but it was returning null dataframe. It was working fine for other kinds of similar tables. Tried us ...

Skip errors and continue loop when url provides no file

I am using Tabula-py to download and extract tables from PDFs via a list of URLs. The URLs are created based on rules and everything is working fine e ...

Importing rotated text from a PDF table such as with tabula-py in python

Is there a way to import rotated text from a PDF table such as with tabula-py in python? I realize I can just rename the column headers in this case, ...

Why is the data in the PDF written in the 1st column?

I have a pdf file called Question.pdf, and its content is as follows. Question.pdf I am converting my pdf file to an xlsx file using the python tabu ...

How to use tabula in AWS Lambda to read PDF table

Hello I get the following error while trying to use tabula to read a table in a pdf. I was aware of some of the difficulties (here) using this packag ...

Occurring empty lines in the CSV file while converting PDF document to CSV

I am new to python. I have an issue while converting PDf file into CSV format. I have used tabula for converting my PDF file into CSV. but, while conv ...

Tabula-py: specify parameters for tabula.io.build_options

I am trying to understand how the build_options function defined in tabula.io module and the java_options in function convert_into work. To understand ...

How can I extract the background color of a table cell within a PDF file using Python?

I've been using tabula-py, PyPDF2 and tika modules, but none of them seems to detect the background color of a table cell, which is within a PDF file. ...

Easiest way to ignore or drop one header row from first page, when parsing table spanning several pages

I am parsing a PDF with tabula-py, and I need to ignore the first two tables, but then parse the rest of the tables as one, and export to a CSV. On th ...

Tabula py not reading all rows for PDFs with alternating colors for each row when Lattice is set to True

I am trying to extract all rows from the PDF attached here. Here is the code I used: The output shows only those rows which are in the grey backgr ...

Problem extracting table from pdf from web page with tabula (Web Scraping in Python)

when I extract a table from a page, I manage to extract without problems, but the data is out of order. There is data from one column that appears as ...

Pdfplumber - Extract a table in pdf without any borders

I am trying to extract the table as shown in the image here into a data frame. I tried using tabula-py to extract the code but read_pdf returned me [] ...

Merging cells, in the same column, in the same df- Python

I am attempting to merge two cells together. The reason for this is due to the fact that every unit under 'Chassis' should be an alphanumeric (ABCD123 ...