简体   繁体   中英

How to check pdf file is password protected?

How to check pdf file is password protected or not in java? I know of several tools/libraries that can do this but I want to know if this is possible with just program in java.

you can use PDFBox:

http://pdfbox.apache.org/

code example :

try
{
    document = PDDocument.load( yourPDFfile );

    if( document.isEncrypted() )
    {
      //ITS ENCRYPTED!
    }
}

using maven?

<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>pdfbox</artifactId>
    <version>2.0</version>
</dependency>

Update

As per mkl's comment below this answer, it seems that there are two types of PDF structures permitted by the specs: (1) Cross-referenced tables (2) Cross-referenced Streams. The following solution only addresses the first type of structure. This answer needs to be updated to address the second type.

====

All of the answers provided above refer to some third party libraries which is what the OP is already aware of. The OP is asking for native Java approach. My answer is yes, you can do it but it will require a lot of work.

It will require a two step process:

Step 1 : Figure out if the PDF is encrypted

As per Adobe's PDF 1.7 specs (page number 97 and 115), if the trailer record contains the key "\\Encrypted", the pdf is encrypted (the encryption could be simple password protection or RC4 or AES or some custom encryption). Here's a sample code:

    Boolean isEncrypted = Boolean.FALSE;
    try {
        byte[] byteArray = Files.readAllBytes(Paths.get("Resources/1.pdf"));
        //Convert the binary bytes to String. Caution, it can result in loss of data. But for our purposes, we are simply interested in the String portion of the binary pdf data. So we should be fine.
        String pdfContent = new String(byteArray);
        int lastTrailerIndex = pdfContent.lastIndexOf("trailer");
        if(lastTrailerIndex >= 0 && lastTrailerIndex < pdfContent.length()) {
            String newString =  pdfContent.substring(lastTrailerIndex, pdfContent.length());
            int firstEOFIndex = newString.indexOf("%%EOF");
            String trailer = newString.substring(0, firstEOFIndex);
            if(trailer.contains("/Encrypt"))
                isEncrypted = Boolean.TRUE;
        }
    }
    catch(Exception e) {
        System.out.println(e);
        //Do nothing
    }

Step 2 : Figure out the encryption type

This step is more complex. I don't have a code sample yet. But here is the algorithm:

  1. Read the value of the key "/Encrypt" from the trailer as read in the step 1 above. Eg the value is 288 0 R.
  2. Look for the bytes "288 0 obj". This is the location of the "encryption dictionary" object in the document. This object boundary ends at the string "endobj".
  3. Look for the key "/Filter" in this object. The "Filter" is the one that identifies the document's security handler. If the value of the "/Filter" is "/Standard", the document uses the built-in password-based security handler.

If you just want to know whether the PDF is encrypted without worrying about whether the encryption is in form of owner / user password or some advance algorithms, you don't need the step 2 above.

Hope this helps.

Using iText pdf API we can identify the password protected PDF.

Example :

    try {
            new PdfReader("C:\\Password_protected.pdf");            
        } catch (BadPasswordException e) {
            System.out.println("PDF is password protected..");
        } catch (Exception e) {
            e.printStackTrace();
        }

You can validate pdf, ie it can be readable, writable by using Itext.

Following is the code snippet,

boolean isValidPdf = false;
try {
    InputStream tempStream = new FileInputStream(new File("path/to/pdffile.pdf"));
    PdfReader reader = new PdfReader(tempStream);
    isValidPdf = reader.isOpenedWithFullPermissions();
    } catch (Exception e) {
        isValidPdf = false;
    }

The correct how to do it in java answer is per @vhs.

However in any application by far the simplest is to use very lightweight pdfinfo tool to filter the encryption status and here using windows cmd I can instantly get a report that two different copies of the same file are encrypted

>forfiles /m *.pdf /C "cmd /c echo @file &pdfinfo @file|find /i \"Encrypted\""

"Certificate (9).pdf"
Encrypted:      no

"ds872 source form.pdf"
Encrypted:      AES 128-bit

"ds872 filled form.pdf"
Encrypted:      AES 128-bit

"How to extract data from a particular area in a PDF file - Stack Overflow.pdf"
Encrypted:      no

"Test.pdf"
Encrypted:      no

>

The solution:

1) Install PDF Parser http://www.pdfparser.org/

2) Edit Parser.php in this section:

if (isset($xref['trailer']['encrypt'])) {
echo('Your Allert message');
exit();}

3)In your .php form post ( ex. upload.php) insert this:

for the first require  '...yourdir.../vendor/autoload.php';

then write this function:

function pdftest_is_encrypted($form) {
$parser = new \Smalot\PdfParser\Parser();
$pdf    = $parser->parseFile($form);
}

and then call the function

pdftest_is_encrypted($_FILES["upfile"]["tmp_name"]);

This is all, if you'll try to load a PDF with password the system return an error "Your Allert message"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM