简体   繁体   English

Java:从 getResourceAsStream 读取得到太多字节

[英]Java: Reading from getResourceAsStream gets too many bytes

I'm trying to read a binary file, using getResourceAsStream.我正在尝试使用 getResourceAsStream 读取二进制文件。 The problem is I get too many bytes back.问题是我得到了太多字节。 The file is 56374 bytes long, according to ls, but when I read it in my code, I consistently get 85194 bytes.根据 ls,该文件长 56374 字节,但是当我在代码中读取它时,我始终得到 85194 字节。

InputStream fileData = checkNotNull(MyClass.class.getResourceAsStream(path));
byte [] b = IOUtils.toByteArray(fileData);
int count = b.length;

I get the same result with similar code:我用类似的代码得到相同的结果:

InputStream fileData = checkNotNull(MyClass.class.getResourceAsStream(path));
byte [] b = new byte[1000*1000];
int count  = fileData.read(b);

If I run the code without the resource, everything is fine, I get the correct number of bytes.如果我在没有资源的情况下运行代码,一切都很好,我得到了正确的字节数。

    FileInputStream fis = new FileInputStream(path);
    byte [] b = new byte[1000*1000];
    int count  = fis.read(b);

The first bytes of the data I read match.我读取的数据的第一个字节匹配。 Checking the output, the first byte that doesn't match is "CO", which comes out as "ef bf bd".检查 output,第一个不匹配的字节是“CO”,输出为“ef bf bd”。

Maybe somehow it's trying to convert to/from UTF-8?也许它试图以某种方式与 UTF-8 相互转换? Everything should be binary here.这里的一切都应该是二进制的。 There is no text involved.没有涉及文本。

Any help appreciated.任何帮助表示赞赏。

Edit: I'm pretty sure I'm reading the correct file.编辑:我很确定我正在阅读正确的文件。 If I rename the file, the read fails.如果我重命名文件,则读取失败。 Change it back, it works.改回来,就可以了。 I changed the resource name in intellij, and it refactored and changed the name in the code, which still worked.我在intellij中更改了资源名称,它在代码中重构并更改了名称,仍然有效。

Edit2: I was wrong.编辑2:我错了。 I'm not looking at the correct file.我没有查看正确的文件。 I traced into getResourceAsStream.我追踪到 getResourceAsStream。 Our build system copies the file to a build output directory, and runs from there.. This destination file is the wrong size, so it appears the copy is doing some damage.我们的构建系统将文件复制到构建 output 目录,并从那里运行。这个目标文件的大小错误,所以看起来副本正在造成一些损害。

Note that it would copy the file again, any time I changed the name, which is why I thought I had the right file.请注意,每当我更改名称时,它都会再次复制文件,这就是为什么我认为我有正确的文件。

I suspect that you are actually reading a different version of the file when you read it as a resource.我怀疑当您将文件作为资源阅读时,您实际上正在阅读该文件的不同版本。 The JVM reads resources as located by the classloader. JVM 读取类加载器定位的资源。 So when you resolve the same path string as a resource and as a file, there is a good chance they are resolving to different things.因此,当您将相同的path字符串解析为资源和文件时,它们很有可能解析为不同的事物。

I doubt that the root issue is Unicode or UTF-8.我怀疑根本问题是 Unicode 或 UTF-8。 Your examples show that you are reading the state using InputStream .您的示例表明您正在使用InputStream阅读 state 。 That approach is encoding agnostic... and will give you the raw bytes from the file(s).这种方法是编码不可知的......并将为您提供文件中的原始字节。 A regular InputStream doesn't try to decode the bytes it reads.常规InputStream不会尝试对其读取的字节进行解码。

Having said that, it is definitely significant that the bytes you are reading are different.话虽如此,您正在读取的字节不同是绝对重要的。 But that is also consistent with simply reading different files.但这与简单地读取不同的文件是一致的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM