简体   繁体   English

Java PDFBox,从表的列中提取数据

[英]Java PDFBox, extract data from a column of a table

I would like to find out how to extract from this pdf(ex. image) http://postimg.org/image/ypebht5dx/ 我想了解如何从此pdf(例如图片)中提取http://postimg.org/image/ypebht5dx/

For example, I want to extract only the values ​​in the column "TENSIONE[V]" and if it encounters a blank cell I enter the letter "X" in the output. 例如,我只想提取“ TENSIONE [V]”列中的值,如果遇到空白单元格,则在输出中输入字母“ X”。 How could I do? 我该怎么办?

The code I used is this: 我使用的代码是这样的:

 PDDocument p=PDDocument.load(new File("a.pdf"));
 PDFTextStripper t=new PDFTextStripper();
 System.out.println(t.getText(p));

and I get this output: 我得到以下输出:

http://s23.postimg.org/wbhcrw03v/Immagine.png http://s23.postimg.org/wbhcrw03v/Immagine.png

These are just guidelines. 这些只是准则。 Use them upon your use. 使用时请使用它们。 This is not tested either, but help you solve your issue. 这也没有经过测试,但是可以帮助您解决问题。 If you have any question let me know. 如果您有任何问题,请告诉我。

String text = t.getText(p);
String lines[] = text.split("\\r?\\n"); // give you all the lines separated by new line

String cols[] = lines[0].split("\\s+") // gives array separated by whitespaces
// cols[0] contains pins
// clos[1] contains TENSIONE[V]
// cols[2] contains TOLLRENZA if not present then its empty

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM