简体   繁体   English

Java通过读取前几个字节读取实际文件类型(Forensic)

[英]Java read actual file type by reading first few bytes (Forensic)

hello I need a way to read first four bytes of any file using Java. 你好我需要一种方法来使用Java读取任何文件的前四个字节。 Why the first four bytes? 为什么前四个字节? Because it's forensic thumb print of the actual file type (File extension not reliable as it can be falsified) 因为它是实际文件类型的取证拇指打印(文件扩展名不可靠,因为它可以被伪造)

http://en.wikipedia.org/wiki/File_carving http://en.wikipedia.org/wiki/File_carving

Now, reading a file this way (below, Java code) will read the file "content" , I think it skips file header information...? 现在,以这种方式读取文件(下面是Java代码)将读取文件“内容” ,我认为它会跳过文件头信息......? I can't get the Magic Number (first four bytes) and thus unable to identify/confirm the true file type of a given specimen. 我无法获得Magic Number (前四个字节),因此无法识别/确认给定样本的真实文件类型。

byte[] buffer = new byte[4];
InputStream is = new FileInputStream("somwhere.in.the.dark");
if (is.read(buffer) != buffer.length) { 
    // do something 
}
is.close();

Read First 4 Bytes of File 读取前4个文件字节

Suggestion please? 建议好吗?

As Blank suggested, https://tika.apache.org 正如Blank所说, https://tika.apache.org

Here's the code - in this example, "test3_iamexe.txt" is an exe cutable, with file extension renamed to " txt " by attacker. 这是代码 - 在这个例子中, “test3_iamexe.txt”是一个exe文件,文件扩展名被攻击者重命名为“ txt ”。

import org.apache.tika.exception.TikaException;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.mime.MediaType;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.sax.BodyContentHandler;
import org.apache.tika.parser.AbstractParser;
import org.apache.tika.parser.AutoDetectParser;
import org.apache.tika.sax.XHTMLContentHandler;
import org.xml.sax.ContentHandler;
import org.xml.sax.SAXException;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.Collections;
import java.util.Set;
import org.apache.tika.metadata.Property;

public class TestTika {

    public static void main(String[] args) {
        File file = null;
    InputStream stream = null;
        String contentType = null;

        try
        {
            file = new File("C:\\tmp\\test3_iamexe.txt");
            stream = new FileInputStream(file);

            AutoDetectParser parser = new AutoDetectParser();
            BodyContentHandler handler = new BodyContentHandler();
            Metadata metadata = new Metadata();

            try {
                // This step here is a little expensive
                parser.parse(stream, handler, metadata);
            } finally {
                stream.close();
            }

            // metadata is a HashMap, you can loop over it see what you need. Alternatively, I think Content-Type is what you need
            contentType = metadata.get("Content-Type");

        } catch(...)
        {
            // handle it
        }

        return;
    }
}

I think you can use: 我想你可以用:

IOUtils.toByteArray(InputStream is) 

See here : IOUtils.toByteArray to convert your InputStream to a byteArray, then get the first 4 bytes. 请参阅此处: IOUtils.toByteArray将InputStream转换为byteArray,然后获取前4个字节。

Use the java.nio.file API for that; 使用java.nio.file API; and specifically, write your own FileTypeDetector . 特别是,编写自己的FileTypeDetector

I happen to be doing exactly that in one of my projects: 我碰巧在我的一个项目中正是这样做的:

https://github.com/fge/java7-fs-more/tree/topic/filetypedetector https://github.com/fge/java7-fs-more/tree/topic/filetypedetector

With this I am able to use Files.probeContentType() and return the exact type of the file as a MIME string. 有了这个,我可以使用Files.probeContentType()并将文件的确切类型作为MIME字符串返回。

See the test file . 查看测试文件


Now, how it works: 现在,它是如何工作的:

  • you write your own implementation of a FileTypeDetector ( here is an example to detect PNG files); 你编写自己的FileTypeDetector实现( 是一个检测PNG文件的例子);
  • you make it return null if the detector can't determine the type; 如果检测器无法确定类型,则使其返回null ;
  • you register the implementation in META-INF/services/java.nio.file.spi.FileTypeDetector (see here ); 你在META-INF/services/java.nio.file.spi.FileTypeDetector注册实现(见这里 );
  • test it... 测试一下......
  • and use Files.probeContentType() . 并使用Files.probeContentType()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM