简体   繁体   English

匹配流文件签名

[英]Matching streamed file signature

I'm trying to determine file type which is being recieved through a stream (in order to name it with the proper file extension). 我正在尝试确定通过流接收的文件类型(以便使用适当的文件扩展名来命名)。 I've written determineFormat(String str) method which is feed by bytesToHex() method (bytes are from the buffer). 我已经编写了由bytesToHex()方法(字节来自缓冲区)提供的bytesToHex() determineFormat(String str)方法。 Unfortunately this doesn't work as expected; 不幸的是,这没有按预期进行。 determineFormat() always return .aac extension even though .mp3 is being recived. 即使正在获取.mp3 ,defineFormat determineFormat()始终返回.aac扩展名。

 public String determineFormat(String str)  {

    Pattern aacPattern = Pattern.compile("FFF1|FFF9");
    Pattern mp3Pattern = Pattern.compile("494433|FFFB");

        Matcher matcher = aacPattern.matcher(str);
        if(matcher.find())  {
            return "aac";

        }

     matcher = mp3Pattern.matcher(str);
    if(matcher.find())  {
        return "mp3";
    }

    return "unknown";
}

I feed my determineFormat() method using this: 我使用以下方法来提供我的determineFormat()方法:

public String bytesToHex(byte[] bytes) {
    char[] hexChars = new char[bytes.length * 2];
    int v;
    for ( int j = 0; j < bytes.length; j++ ) {
        v = bytes[j] & 0xFF;
        hexChars[j * 2] = hexArray[v >>> 4];
        hexChars[j * 2 + 1] = hexArray[v & 0x0F];
    }
    return new String(hexChars);
}

I think it's because you match your pattern against the whole file. 我认为这是因为您将模式与整个文件进行了匹配。 Change the patterns to 将模式更改为

Pattern aacPattern = Pattern.compile("^(FFF1|FFF9)");
Pattern mp3Pattern = Pattern.compile("^(494433|FFFB)");

And then of course it's enough if you pass in only the first couple of bytes. 然后,如果仅传入前两个字节就足够了。 For getting the bytes in hex you could rather do something easy like 为了获得十六进制的字节,您宁愿做些简单的事情,例如

StringBuilder sb = new StringBuilder();
for (byte b : bytes) {
    sb.append(String.format("%02X", b));
}
// sb.toString();

The problem reveals to be simpler than it seemed to be. 问题似乎比看起来简单。 I was testing my app with MPEG-2 Audio Layer 3 with ID3v2 . 我正在MPEG-2 Audio Layer 3 with ID3v2测试我的应用程序。 I've decided to read the raw "HexToString` output: 我决定阅读原始的“ HexToString`输出:

0DCB1C992B37173740244875C143D50ACDBA0422CD01D73D3C78F05ED7BBC2B33F9D78A7FFF342C0241C6B56B11EC1867984C20F42A4FAC5B9C0
42220314C006D94E124673CD4CC27FC2FCE12215410F12086BE5A3EDFC6DB2BEB0EAEC6EAAA4BF997FFB3337F914AB1A89C808EA6D338912D72E
99CE11E899999D3AE1092590FB2B71D736DC544B0AFD1035A3FFF340C00E178B62E5BE48C46F04B8EFC106AE3F17DDE08B5FD48672EBEABB216A
8438B6FB3B33BF91D3F3EBFCE14184320532ABA37FFD59BFF6ABAD1AA9AADEE73220679D2C7DDBAB766433A99D8CA752B383067465691750A24A
00F32A5078E29258F6D87A620AFFF342C00A158B22E5BE5944BAE8BA2C54739BE486B719A76DF5FD984D5257DBEAC43B238598EFAB3592DE8DD5

The "real" file signature reveals to be FFF3 . “实际”文件签名显示为FFF3 After that I've found this site, which describes mpeg Layer 3 files: http://www.nationalarchives.gov.uk/PRONOM/Format/proFormatSearch.aspx?status=detailReport&id=687&strPageToDisplay=signatures . 之后,我找到了这个网站,该网站描述了mpeg第3层文件: http : //www.nationalarchives.gov.uk/PRONOM/Format/proFormatSearch.aspx ?status=detailReport& id= 687&strPageToDisplay=signatures。 Finally I was able to get my code to work nicely with fixed patterns: 最终,我能够使我的代码在固定模式下正常工作:

Pattern aacPattern = Pattern.compile("(FFF1|FFF9)");
Pattern mp3Pattern = Pattern.compile("(FFF3|FFFA|FFFB)");

At the beginning I was mislead by information about signatures I got from this site: http://www.garykessler.net/library/file_sigs.html 一开始,我对从此网站获得的有关签名的信息产生了误解: http : //www.garykessler.net/library/file_sigs.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM