简体   繁体   English

PDF 到字节数组,反之亦然

[英]PDF to byte array and vice versa

I need to convert pdf to byte array and vice versa.我需要将 pdf 转换为字节数组,反之亦然。

Can any one help me?谁能帮我?

This is how I am converting to byte array这就是我转换为字节数组的方式

public static byte[] convertDocToByteArray(String sourcePath) {

    byte[] byteArray=null;
    try {
        InputStream inputStream = new FileInputStream(sourcePath);


        String inputStreamToString = inputStream.toString();
        byteArray = inputStreamToString.getBytes();

        inputStream.close();
    } catch (FileNotFoundException e) {
        System.out.println("File Not found"+e);
    } catch (IOException e) {
                System.out.println("IO Ex"+e);
    }
    return byteArray;
}

If I use following code to convert it back to document, pdf is getting created.如果我使用以下代码将其转换回文档,则会创建 pdf。 But it's saying 'Bad Format. Not a pdf'但它说'Bad Format. Not a pdf' 'Bad Format. Not a pdf' . 'Bad Format. Not a pdf'

public static void convertByteArrayToDoc(byte[] b) {          

    OutputStream out;
    try {       
        out = new FileOutputStream("D:/ABC_XYZ/1.pdf");
        out.close();
        System.out.println("write success");
    }catch (Exception e) {
        System.out.println(e);
    }

Java 7 introduced Files.readAllBytes() , which can read a PDF into a byte[] like so: Java 7 引入了Files.readAllBytes() ,它可以将 PDF 读入byte[]如下所示:

import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.Files;

Path pdfPath = Paths.get("/path/to/file.pdf");
byte[] pdf = Files.readAllBytes(pdfPath);

EDIT:编辑:

Thanks Farooque for pointing out: this will work for reading any kind of file, not just PDFs.感谢 Farooque 指出:这适用于阅读任何类型的文件,而不仅仅是 PDF。 All files are ultimately just a bunch of bytes, and as such can be read into a byte[] .所有文件最终都只是一堆字节,因此可以读入byte[]

You basically need a helper method to read a stream into memory.您基本上需要一个辅助方法来将流读入内存。 This works pretty well:这很好用:

public static byte[] readFully(InputStream stream) throws IOException
{
    byte[] buffer = new byte[8192];
    ByteArrayOutputStream baos = new ByteArrayOutputStream();

    int bytesRead;
    while ((bytesRead = stream.read(buffer)) != -1)
    {
        baos.write(buffer, 0, bytesRead);
    }
    return baos.toByteArray();
}

Then you'd call it with:然后你会打电话给它:

public static byte[] loadFile(String sourcePath) throws IOException
{
    InputStream inputStream = null;
    try 
    {
        inputStream = new FileInputStream(sourcePath);
        return readFully(inputStream);
    } 
    finally
    {
        if (inputStream != null)
        {
            inputStream.close();
        }
    }
}

Don't mix up text and binary data - it only leads to tears.不要混淆文本和二进制数据 - 它只会导致流泪。

The problem is that you are calling toString() on the InputStream object itself.问题是您在InputStream对象本身上调用toString() This will return a String representation of the InputStream object not the actual PDF document.这将返回InputStream对象的String表示,而不是实际的 PDF 文档。

You want to read the PDF only as bytes as PDF is a binary format.您只想将 PDF 作为字节读取,因为 PDF 是二进制格式。 You will then be able to write out that same byte array and it will be a valid PDF as it has not been modified.然后,您将能够写出相同的byte数组,它将是一个有效的 PDF,因为它没有被修改。

eg to read a file as bytes例如以字节形式读取文件

File file = new File(sourcePath);
InputStream inputStream = new FileInputStream(file); 
byte[] bytes = new byte[file.length()];
inputStream.read(bytes);

You can do it by using Apache Commons IO without worrying about internal details.您可以通过使用Apache Commons IO而无需担心内部细节。

Use org.apache.commons.io.FileUtils.readFileToByteArray(File file) which return data of type byte[] .使用org.apache.commons.io.FileUtils.readFileToByteArray(File file)返回byte[]类型的数据。

Click here for Javadoc 单击此处获取 Javadoc

public static void main(String[] args) throws FileNotFoundException, IOException {
        File file = new File("java.pdf");

        FileInputStream fis = new FileInputStream(file);
        //System.out.println(file.exists() + "!!");
        //InputStream in = resource.openStream();
        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        byte[] buf = new byte[1024];
        try {
            for (int readNum; (readNum = fis.read(buf)) != -1;) {
                bos.write(buf, 0, readNum); //no doubt here is 0
                //Writes len bytes from the specified byte array starting at offset off to this byte array output stream.
                System.out.println("read " + readNum + " bytes,");
            }
        } catch (IOException ex) {
            Logger.getLogger(genJpeg.class.getName()).log(Level.SEVERE, null, ex);
        }
        byte[] bytes = bos.toByteArray();

        //below is the different part
        File someFile = new File("java2.pdf");
        FileOutputStream fos = new FileOutputStream(someFile);
        fos.write(bytes);
        fos.flush();
        fos.close();
    }

This worked for me.这对我有用。 I haven't used any third-party libraries.我没有使用任何第三方库。 Just the ones that are shipped with Java.只是 Java 附带的那些。

import java.io.*;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;

public class PDFUtility {

public static void main(String[] args) throws IOException {
    /**
     * Converts byte stream into PDF.
     */
    PDFUtility pdfUtility = new PDFUtility();
    byte[] byteStreamPDF = pdfUtility.convertPDFtoByteStream();
    FileOutputStream fileOutputStream = new FileOutputStream("C:\\Users\\aseem\\Desktop\\BlaFolder\\BlaFolder2\\aseempdf.pdf");
    fileOutputStream.write(byteStreamPDF);
    fileOutputStream.close();
    System.out.println("File written successfully");
}

/**
 * Creates PDF to Byte Stream
 *
 * @return
 * @throws IOException
 */
protected byte[] convertPDFtoByteStream() throws IOException {
    Path path = Paths.get("C:\\Users\\aseem\\aaa.pdf");
    return Files.readAllBytes(path);
}

}

Calling toString() on an InputStream doesn't do what you think it does.InputStream上调用toString()不会做您认为的那样。 Even if it did, a PDF contains binary data, so you wouldn't want to convert it to a string first.即使是这样,PDF 也包含二进制数据,因此您不希望先将其转换为字符串。

What you need to do is read from the stream, write the results into a ByteArrayOutputStream , then convert the ByteArrayOutputStream into an actual byte array by calling toByteArray() :您需要做的是从流中读取,将结果写入ByteArrayOutputStream ,然后通过调用toByteArray()ByteArrayOutputStream转换为实际的byte数组:

InputStream inputStream = new FileInputStream(sourcePath);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

int data;
while( (data = inputStream.read()) >= 0 ) {
    outputStream.write(data);
}

inputStream.close();
return outputStream.toByteArray();

Are'nt you creating the pdf file but not actually writing the byte array back?你不是在创建 pdf 文件但实际上没有写回字节数组吗? Therefore you cannot open the PDF.因此您无法打开 PDF。

out = new FileOutputStream("D:/ABC_XYZ/1.pdf");
out.Write(b, 0, b.Length);
out.Position = 0;
out.Close();

This is in addition to correctly reading in the PDF to byte array.这是正确读取 PDF 到字节数组的补充。

To convert pdf to byteArray :将 pdf 转换为 byteArray

public byte[] pdfToByte(String filePath)throws JRException {

         File file = new File(<filePath>);
         FileInputStream fileInputStream;
         byte[] data = null;
         byte[] finalData = null;
         ByteArrayOutputStream byteArrayOutputStream = null;

         try {
            fileInputStream = new FileInputStream(file);
            data = new byte[(int)file.length()];
            finalData = new byte[(int)file.length()];
            byteArrayOutputStream = new ByteArrayOutputStream();

            fileInputStream.read(data);
            byteArrayOutputStream.write(data);
            finalData = byteArrayOutputStream.toByteArray();

            fileInputStream.close(); 

        } catch (FileNotFoundException e) {
            LOGGER.info("File not found" + e);
        } catch (IOException e) {
            LOGGER.info("IO exception" + e);
        }

        return finalData;

    }

This works for me:这对我有用:

try(InputStream pdfin = new FileInputStream("input.pdf");OutputStream pdfout = new FileOutputStream("output.pdf")){
    byte[] buffer = new byte[1024];
    int bytesRead;
    while((bytesRead = pdfin.read(buffer))!=-1){
        pdfout.write(buffer,0,bytesRead);
    }
}

But Jon's answer doesn't work for me if used in the following way:但是,如果按以下方式使用,乔恩的回答对我不起作用:

try(InputStream pdfin = new FileInputStream("input.pdf");OutputStream pdfout = new FileOutputStream("output.pdf")){

    int k = readFully(pdfin).length;
    System.out.println(k);
}

Outputs zero as length.输出零作为长度。 Why is that ?这是为什么 ?

None of these worked for us, possibly because our inputstream was byte s from a rest call, and not from a locally hosted pdf file.这些都不适合我们,可能是因为我们的输入inputstream是来自休息调用的byte ,而不是来自本地托管的 pdf 文件。 What worked was using RestAssured to read the PDF as an input stream, and then using Tika pdf reader to parse it and then call the toString() method.有效的是使用RestAssured将 PDF 作为输入流读取,然后使用 Tika pdf reader 解析它,然后调用toString()方法。

import com.jayway.restassured.RestAssured;
import com.jayway.restassured.response.Response;
import com.jayway.restassured.response.ResponseBody;

import org.apache.tika.exception.TikaException;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.AutoDetectParser;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.sax.BodyContentHandler;
import org.apache.tika.parser.Parser;
import org.xml.sax.ContentHandler;
import org.xml.sax.SAXException;

            InputStream stream = response.asInputStream();
            Parser parser = new AutoDetectParser(); // Should auto-detect!
            ContentHandler handler = new BodyContentHandler();
            Metadata metadata = new Metadata();
            ParseContext context = new ParseContext();

            try {
                parser.parse(stream, handler, metadata, context);
            } finally {
                stream.close();
            }
            for (int i = 0; i < metadata.names().length; i++) {
                String item = metadata.names()[i];
                System.out.println(item + " -- " + metadata.get(item));
            }

            System.out.println("!!Printing pdf content: \n" +handler.toString());
            System.out.println("content type: " + metadata.get(Metadata.CONTENT_TYPE));

I have implemented similiar behaviour in my Application too without fail.我也没有失败地在我的应用程序中实现了类似的行为。 Below is my version of code and it is functional.下面是我的代码版本,它是功能性的。

    byte[] getFileInBytes(String filename) {
    File file  = new File(filename);
    int length = (int)file.length();
    byte[] bytes = new byte[length];
    try {
        BufferedInputStream reader = new BufferedInputStream(new 
    FileInputStream(file));
    reader.read(bytes, 0, length);
    System.out.println(reader);
    // setFile(bytes);

    } catch (FileNotFoundException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

    return bytes;
    }
public String encodeFileToBase64Binary(String fileName)
        throws IOException {
        System.out.println("encodeFileToBase64Binary: "+ fileName);
    File file = new File(fileName);
    byte[] bytes = loadFile(file);
    byte[] encoded = Base64.encodeBase64(bytes);
    String encodedString = new String(encoded);
    System.out.println("ARCHIVO B64: "+encodedString);


    return encodedString;
}

@SuppressWarnings("resource")
public static byte[] loadFile(File file) throws IOException {
    InputStream is = new FileInputStream(file);

    long length = file.length();
    if (length > Integer.MAX_VALUE) {
        // File is too large
    }
    byte[] bytes = new byte[(int)length];

    int offset = 0;
    int numRead = 0;
    while (offset < bytes.length
            && (numRead=is.read(bytes, offset, bytes.length-offset)) >= 0) {
        offset += numRead;
    }

    if (offset < bytes.length) {
        throw new IOException("Could not completely read file "+file.getName());
    }

    is.close();
    return bytes;
}

PDFs may contain binary data and chances are it's getting mangled when you do ToString. PDF 可能包含二进制数据,并且在您执行 ToString 时它可能会被破坏。 It seems to me that you want this:在我看来,你想要这个:

        FileInputStream inputStream = new FileInputStream(sourcePath);

        int numberBytes = inputStream .available();
        byte bytearray[] = new byte[numberBytes];

        inputStream .read(bytearray);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM