Getting decryption error while extracting data from pdf by selenium webdriver

Question

I am trying to extract text from some url that has pdf file but i am getting some error like this - INFO: Document is encrypted May 27, 2015 9:27:50 AM org.apache.pdfbox.filter.FlateFilter decode

public void getTextFromPdf(String urlS) throws IOException {
        driver.get(urlS);
        driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);
        URL url = new URL(driver.getCurrentUrl());
        BufferedInputStream fileToParse = new BufferedInputStream(url.openStream());

        //parse()  --  This will parse the stream and populate the COSDocument object. 
        //COSDocument object --  This is the in-memory representation of the PDF document
        PDFParser parser = new PDFParser(fileToParse);
        parser.parse();

        //getPDDocument() -- This will get the PD document that was parsed. When you are done with this document you must call    close() on it to release resources
        //PDFTextStripper() -- This class will take a pdf document and strip out all of the text and ignore the formatting and           such.
        System.out.println(urlS);
        String output = new PDFTextStripper().getText(parser.getPDDocument());
        System.out.println(output);
        parser.getPDDocument().close();
        driver.manage().timeouts().implicitlyWait(100, TimeUnit.SECONDS);

Answer 1

Please use this code for the PDFBox work:

        PDDocument doc = PDDocument.loadNonSeq(fileToParse, null);
        String output = new PDFTextStripper().getText(doc);
        doc.close();

About the dependencies, read this or use the pdfbox-app jar file that you can find here .

Getting decryption error while extracting data from pdf by selenium webdriver

Question

1 answers

solution1
0 2015-05-27 08:57:15

Getting decryption error while extracting data from pdf by selenium webdriver

Question

1 answers

solution1 0 2015-05-27 08:57:15

solution1
0 2015-05-27 08:57:15