简体   繁体   English

如何使用PDFBox将文本提取到JLabel

[英]How to extract text to JLabel using PDFBox

I haven't been coding for long and decided to write a program that would download the current Official Golf World Rankings in PDF form and then display the top 10 using JLabels. 我已经进行了很长时间的编码,因此决定编写一个程序,该程序将以PDF格式下载当前的《高尔夫世界官方排名》,然后使用JLabel显示前十名。

While the program is able to download the file I have been unable to find out how to extract individuals cells from the table containing the data ie extract "This Week", "Name", "Country" columns to individual arrays. 虽然程序可以下载文件,但我无法找出如何从包含数据的表中提取单个单元格,即将“本周”,“名称”,“国家”列提取到单个数组中。

Could someone please give me some advice on how I would go about doing this? 有人可以给我一些建议,我该怎么做吗?

I recently had to do something similar, my code looks like this (using PDFBox): 我最近不得不做类似的事情,我的代码如下所示(使用PDFBox):

PDFParser pdfParser = new PDFParser(new FileInputStream("c:\\temp\\owgr49f2013.pdf"));
pdfParser.parse();
PDDocument pdDocument = pdfParser.getPDDocument();

PDFTextStripper stripper = new PDFTextStripper("UTF-8");
stripper.setSortByPosition(false);
stripper.setWordSeparator("###");
System.out.println(stripper.getText(pdDocument));

You'll need to extract the information you need from the resulting text with regular expressions or so. 您需要使用正则表达式等从结果文本中提取所需的信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM