简体   繁体   English

在base64 java中编码文件失败

[英]Failure encoding files in base64 java

I have this class to encode and decode a file. 我有这个类来编码和解码文件。 When I run the class with .txt files the result is successfully. 当我使用.txt文件运行该类时,结果是成功的。 But when I run the code with .jpg or .doc I can not open the file or it is not equals to original. 但是当我用.jpg或.doc运行代码时,我无法打开文件,或者它不等于原始文件。 I don't know why this is happening. 我不知道为什么会这样。 I have modified this class http://myjeeva.com/convert-image-to-string-and-string-to-image-in-java.html . 我修改了这个类http://myjeeva.com/convert-image-to-string-and-string-to-image-in-java.html But i want change this line 但我想改变这一行

byte imageData[] = new byte[(int) file.length()];

for 对于

byte example[] = new byte[1024];

and read the file so many times how we need. 并多次读取我们需要的文件。 Thanks. 谢谢。

import java.io.*;
import java.util.*;

  public class Encode {

Input = Input file root - Output = Output file root - imageDataString =String encoded 输入=输入文件根 - 输出=输出文件根 - imageDataString =字符串编码

  String input;
  String output;
  String imageDataString;


  public void setFileInput(String input){
    this.input=input;
  }

  public void setFileOutput(String output){
    this.output=output;
  }

  public String getFileInput(){
    return input;
  }

  public String getFileOutput(){
    return output;
  }

  public String getEncodeString(){
    return  imageDataString;
  }

  public String processCode(){
    StringBuilder sb= new StringBuilder();

    try{
        File fileInput= new File( getFileInput() );
        FileInputStream imageInFile = new FileInputStream(fileInput);

i have seen in examples that people create a byte[] with the same length than the file. 我在例子中看到人们创建一个与文件长度相同的byte []。 I don´t want this because i will not know what length will have the file. 我不想要这个,因为我不知道该文件的长度。

        byte buff[] = new byte[1024];

        int r = 0;

        while ( ( r = imageInFile.read( buff)) > 0 ) {

          String imageData = encodeImage(buff);

          sb.append( imageData);

          if ( imageInFile.available() <= 0 ) {
            break;
          }
        }



       } catch (FileNotFoundException e) {
        System.out.println("File not found" + e);
      } catch (IOException ioe) {
        System.out.println("Exception while reading the file " + ioe);

    } 

        imageDataString = sb.toString();

       return imageDataString;
}  


  public  void processDecode(String str) throws IOException{

      byte[] imageByteArray = decodeImage(str);
      File fileOutput= new File( getFileOutput());
      FileOutputStream imageOutFile = new FileOutputStream( fileOutput);

      imageOutFile.write(imageByteArray);
      imageOutFile.close();

}

 public static String encodeImage(byte[] imageByteArray) {

      return  Base64.getEncoder().withoutPadding().encodeToString( imageByteArray);

    }

    public static byte[] decodeImage(String imageDataString) {
      return  Base64.getDecoder().decode(  imageDataString);  

    }


  public static void main(String[] args) throws IOException {

    Encode a = new Encode();

    a.setFileInput( "C://Users//xxx//Desktop//original.doc");
    a.setFileOutput("C://Users//xxx//Desktop//original-copied.doc");

    a.processCode( );

    a.processDecode( a.getEncodeString());

    System.out.println("C O P I E D");
  }
}

I tried changing 我试过改变

String imageData = encodeImage(buff);

for 对于

String imageData = encodeImage(buff,r);

and the method encodeImage 和方法encodeImage

public static String encodeImage(byte[] imageByteArray, int r) {

     byte[] aux = new byte[r];

     for ( int i = 0; i < aux.length; i++) {
       aux[i] = imageByteArray[i];

       if ( aux[i] <= 0 ) {
         break;
       }
     }
return  Base64.getDecoder().decode(  aux);
}

But i have the error: 但我有错误:

Exception in thread "main" java.lang.IllegalArgumentException: Last unit does not have enough valid bits   

You have two problems in your program. 你的程序有两个问题。

The first, as mentioned in by @Joop Eggen, is that you are not handling your input correctly. 第一个,正如@Joop Eggen所提到的,是你没有正确处理你的输入。

In fact, Java does not promise you that even in the middle of the file, you'll be reading the entire 1024 bytes. 实际上,Java并不保证即使在文件的中间,你也会读取整个1024字节。 It could just read 50 bytes, and tell you it read 50 bytes, and then the next time it will read 50 bytes more. 它只能读取50个字节,并告诉它读取50个字节,然后下次再读取50个字节。

Suppose you read 1024 bytes in the previous round. 假设您在上一轮中读取了1024个字节。 And now, in the current round, you're only reading 50. Your byte array now contains 50 of the new bytes, and the rest are the old bytes from the previous read! 而现在,在本轮中,你只读50个。你的字节数组现在包含50个新字节,其余的是前一个读取的旧字节!

So you always need to copy the exact number of bytes copied to a new array, and pass that on to your encoding function. 因此,您始终需要复制复制到新数组的确切字节数,并将其传递给编码函数。

So, to fix this particular problem, you'll need to do something like: 因此,要解决此特定问题,您需要执行以下操作:

 while ( ( r = imageInFile.read( buff)) > 0 ) {

      byte[] realBuff = Arrays.copyOf( buff, r );

      String imageData = encodeImage(realBuff);

      ...
 }

However, this is not the only problem here. 但是,这不是唯一的问题。 Your real problem is with the Base64 encoding itself. 你真正的问题在于Base64编码本身。

What Base64 does is take your bytes, break them into 6-bit chunks, and then treat each of those chunks as a number between N 0 and 63. Then it takes the Nth character from its character table, to represent that chunk. Base64所做的是取你的字节,将它们分成6位块,然后将每个块视为N 0和63之间的数字。然后它从字符表中取出第N个字符来表示该块。

But this means it can't just encode a single byte or two bytes, because a byte contains 8 bits, and which means one chunk of 6 bits, and 2 leftover bits. 但这意味着它不能只编码一个字节或两个字节,因为一个字节包含8位,这意味着一个6位的块和2个剩余位。 Two bytes have 16 bits. 两个字节有16位。 Thats 2 chunks of 6 bits, and 4 leftover bits. 这是2个6位的块,还有4个剩余的位。

To solve this problem, Base64 always encodes 3 consecutive bytes. 要解决此问题,Base64始终编码3个连续字节。 If the input does not divide evenly by three, it adds additional zero bits . 如果输入没有均匀地除以3,则会增加额外的零位

Here is a little program that demonstrates the problem: 这是一个演示问题的小程序:

package testing;

import java.util.Base64;

public class SimpleTest {

    public static void main(String[] args) {

        // An array containing six bytes to encode and decode.
        byte[] fullArray = { 0b01010101, (byte) 0b11110000, (byte)0b10101010, 0b00001111, (byte)0b11001100, 0b00110011 };

        // The same array broken into three chunks of two bytes.

        byte[][] threeTwoByteArrays = {
            {       0b01010101, (byte) 0b11110000 },
            { (byte)0b10101010,        0b00001111 },
            { (byte)0b11001100,        0b00110011 }
        };
        Base64.Encoder encoder = Base64.getEncoder().withoutPadding();

        // Encode the full array

        String encodedFullArray = encoder.encodeToString(fullArray);

        // Encode the three chunks consecutively 

        StringBuilder encodedStringBuilder = new StringBuilder();
        for ( byte [] twoByteArray : threeTwoByteArrays ) {
            encodedStringBuilder.append(encoder.encodeToString(twoByteArray));
        }
        String encodedInChunks = encodedStringBuilder.toString();

        System.out.println("Encoded full array: " + encodedFullArray);
        System.out.println("Encoded in chunks of two bytes: " + encodedInChunks);

        // Now  decode the two resulting strings

        Base64.Decoder decoder = Base64.getDecoder();

        byte[] decodedFromFull = decoder.decode(encodedFullArray);   
        System.out.println("Byte array decoded from full: " + byteArrayBinaryString(decodedFromFull));

        byte[] decodedFromChunked = decoder.decode(encodedInChunks);
        System.out.println("Byte array decoded from chunks: " + byteArrayBinaryString(decodedFromChunked));
    }

    /**
     * Convert a byte array to a string representation in binary
     */
    public static String byteArrayBinaryString( byte[] bytes ) {
        StringBuilder sb = new StringBuilder();
        sb.append('[');
        for ( byte b : bytes ) {
            sb.append(Integer.toBinaryString(Byte.toUnsignedInt(b))).append(',');
        }
        if ( sb.length() > 1) {
            sb.setCharAt(sb.length() - 1, ']');
        } else {
            sb.append(']');
        }
        return sb.toString();
    }
}

So, imagine my 6-byte array is your image file. 所以,想象一下我的6字节数组是你的图像文件。 And imagine that your buffer is not reading 1024 bytes but 2 bytes each time. 并且假设您的缓冲区不是每次读取1024个字节而是2个字节。 This is going to be the output of the encoding: 这将是编码的输出:

Encoded full array: VfCqD8wz
Encoded in chunks of two bytes: VfAqg8zDM

As you can see, the encoding of the full array gave us 8 characters. 如您所见,完整数组的编码为我们提供了8个字符。 Each group of three bytes is converted into four chunks of 6 bits, which in turn are converted into four characters. 每组三个字节被转换成4个6比特的块,然后转换成4个字符。

But the encoding of the three two-byte arrays gave you a string of 9 characters. 但是三个双字节数组的编码为您提供了一个包含9个字符的字符串。 It's a completely different string! 这是一个完全不同的字符串! Each group of two bytes was extended to three chunks of 6 bits by padding with zeros. 通过用零填充将每组两个字节扩展为3个6比特的块。 And since you asked for no padding, it produces only 3 characters, without the extra = that usually marks when the number of bytes is not divisible by 3. 并且由于你没有要求填充,它只产生3个字符,没有额外的=通常标记当字节数不能被3整除时。

The output from the part of the program that decodes the 8-character, correct encoded string is fine: 解码8个字符,正确编码字符串的程序部分的输出很好:

Byte array decoded from full: [1010101,11110000,10101010,1111,11001100,110011]

But the result from attempting to decode the 9-character, incorrect encoded string is: 但尝试解码9个字符的错误编码字符串的结果是:

Exception in thread "main" java.lang.IllegalArgumentException: Last unit does not have enough valid bits
    at java.util.Base64$Decoder.decode0(Base64.java:734)
    at java.util.Base64$Decoder.decode(Base64.java:526)
    at java.util.Base64$Decoder.decode(Base64.java:549)
    at testing.SimpleTest.main(SimpleTest.java:34)

Not good! 不好! A good base64 string should always have multiples of 4 characters, and we only have 9. 一个好的base64字符串应该总是有4个字符的倍数,我们只有9个。

Since you chose a buffer size of 1024, which is not a multiple of 3, that problem will happen. 由于您选择的缓冲区大小为1024(不是3的倍数),因此出现问题。 You need to encode a multiple of 3 bytes each time to produce the proper string. 您需要每次编码3个字节的倍数以生成正确的字符串。 So in fact, you need to create a buffer sized 3072 or something like that. 所以实际上,你需要创建一个3072或类似的缓冲区。

But because of the first problem, be very careful at what you pass to the encoder. 但由于第一个问题,要小心传递给编码器的内容。 Because it can always happen that you'll be reading less than 3072 bytes. 因为总是会发生读取少于3072字节的事情。 And then, if the number is not divisible by three, the same problem will occur. 然后,如果数字不能被3整除,则会出现同样的问题。

Look at: 看着:

    while ( ( r = imageInFile.read( buff)) > 0 ) {
      String imageData = encodeImage(buff);

read returns -1 on end-of-file or the actual number of bytes that were read. read在文件结尾返回-1 读取的实际字节数

So the last buff might not be totally read, and even contain garbage from any prior read. 因此,最后一个buff可能不会被完全读取,甚至包含来自任何先前读取的垃圾。 So you need to use r . 所以你需要使用r

As this is an assignment, the rest is up to you. 由于这是一项任务,其余部分由您决定。

By the way: 顺便说说:

 byte[] array = new byte[1024]

is more conventional in Java. 在Java中更常规。 The syntax: 语法:

 byte array[] = ...

was for compatibility with C/C++. 是为了与C / C ++兼容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM