简体   繁体   中英

How to convert pdf to xml using pdfbox or any other library?

I want to convert pdf files into xml. Is there any java library available that can be used for this?

You can fetch xml representation of any PDF document as below using Apache Tika library

InputStream stream = new FileInputStream("sample.pdf");
ContentHandler handler = new ToXMLContentHandler();
Metadata metadata = new Metadata();
AutoDetectParser parser = new AutoDetectParser();
System.out.println(parser.parse(stream, handler, metadata));

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM