How to read line by line in pdf file and create a CSV

Question

Here is my pdf 在此处输入图片说明 I found THIS and I used it to scrap my pdf.

6 BEDROOMS
NameAddressUnitSizeKeyRentSq FtMove in DateNotesTenant
Prop #
Texan 261009 West 26th3076x3$4,6952,1368/15/14$1,000 Bonus (1) Park -

Its pretty mixed up. or Is is because the PDF is formatted in a way which is unreadable? I thought there was a way I could scrap each row and create a CSV with the columns by iteration or something.

Like populate a CSV with columns

T26 | Texan 26          | 1009 West 26th | 307      | 6x3 | ... 
e075| Texan North Campus| 5117 N Lamar   |See below | 6x3 |...

Is there a way around this?

Answer 1

The code snippet that you used has provided some practically unusable data, I don't think that is the way to go. Scraping from a PDF is generally rather difficult, however take a look at pdftables.com: they provide an API for scraping tables from PDF documents which I've found works in the majority of cases - it's your best chance at this i'd say.

Answer 2

You can use Camelot (which is a Python library) to create a script that extracts tabular data from your PDF and export it to a CSV. You can check out the documentation at: http://camelot-py.readthedocs.io . It would be helpful if you could post a link to your PDF. Here's a generic code example:

>>> import camelot
>>> tables = camelot.read_pdf('file.pdf')
>>> type(tables[0].df)
<class 'pandas.core.frame.DataFrame'>
>>> tables[0].to_csv('file.csv')

Disclaimer: I'm the author of the library.

How to read line by line in pdf file and create a CSV

Question

2 answers

solution1
0 2014-09-17 16:48:34

solution2
0 2018-11-09 18:49:40

How to read line by line in pdf file and create a CSV

Question

2 answers

solution1 0 2014-09-17 16:48:34

solution2 0 2018-11-09 18:49:40

solution1
0 2014-09-17 16:48:34

solution2
0 2018-11-09 18:49:40