简体   繁体   中英

Why do I get an empty dataframe when using Tabula?

I have the following code:

df = tabula.read_pdf(r'C:\Users\Max12\Desktop\xml\pdfminer\attachments\Factuur 78692661.PDF', area=[375,7,76,558], pages = 1)
df1 = pd.DataFrame.from_records(df)
print(df1)

Should find it according to attachments. How come I can't find this table?

See attachments for your reference.

Measurements

Jupyter notebook

The problem is in the area parameter mentioned in the code.

According to Tabula Documentation , area parameter has to be mentioned like below:

area (list of float, list of list of float, optional) –
Portion of the page to analyze(top,left,bottom,right). Default is entire page.

Let's say you need to extract data from the middle of the page, so with reference to above parameter:

top == distance between **starting** of your desired data from the top of the page
left == distance between **starting** of your desired data from the left of the page
bottom == distance between **ending** of your desired data from the top of the page
right == distance between **ending** of your desired data from the left of the page

So, the first value in the area parameter list, has to be less than third value. Similarly, the second value in the area parameter list has to be less than fourth value.

Only then, tabula can create a table through given coordinates.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM