简体   繁体   English

Apache Tika-不正确的MIME(内容类型)检测

[英]Apache Tika - Incorrect MIME (content type) detection

I'm trying to detect a file content type passed to a web service into the SOAP envelop. 我试图检测传递给SOAP信封的Web服务的文件内容类型。 This file can be indicated in two ways : 该文件可用两种方式表示:

  • from its url, 从其网址,
  • from its contain (base64 compressed data). 从其包含(base64压缩数据)。

At this point, I'm able to translate this file into a stream buffer. 至此,我可以将此文件转换为流缓冲区。 But, all my tries to get its content type failed. 但是,我所有尝试获取其内容类型的尝试都失败了。 The content type is detected if the file extension is indicated otherwise the content is always detected as "plain/text". 如果指示了文件扩展名,则检测到内容类型,否则始终将内容检测为“纯文本/文本”。

Bellow is my class code : 贝娄是我的课堂代码:

class MetadataAnalyser { 类MetadataAnalyser {

private InputStream _is;

private File _file;

private void initializeAttributes() {

    _is = null;
    _file= null;

}


private void createTemporaryFile(byte[] pData) {

    FileOutputStream fos = null;
    try {
        _file = File.createTempFile(
                UUID.randomUUID().toString().replace("-", ""),
                null,
                new File("C:\\Users\\Florent\\Documents\\NetBeansProjects\\ServiceEdition\\tmp"));
    } catch (IOException e) {
        e.printStackTrace();
    }
    try {
        fos = new FileOutputStream(_file);
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    }
    try {
        fos.write(pData);
    } catch (IOException e) {
        e.printStackTrace();
    }
    try {
        fos.close();
    } catch (IOException e) {
        e.printStackTrace();
    }

    _file.deleteOnExit();

}

public MetadataAnalyser(byte[] pData) {

    initializeAttributes();
    _is = new ByteArrayInputStream(pData);
    createTemporaryFile(pData);

}

public MetadataAnalyser(InputStream pIs) {

    initializeAttributes();
    _is = pIs;
    _file = null;

}

public MetadataAnalyser(File pFile) {

    initializeAttributes();
    try {
        _file = pFile;
        _is = new FileInputStream(_file);
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (Exception e) {
        e.printStackTrace();
    }

}

public MetadataAnalyser(String pFile) {

    initializeAttributes();
    try {
        _file = new File(pFile);
        if (_file.exists()) {
            _is = new FileInputStream(_file);
        }
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (Exception e) {
        e.printStackTrace();
    }

}

public String getContentType() {

    AutoDetectParser parser = null;
    Metadata metadata = null;
    InputStream is = null;
    String mimeType = null;

    parser = new AutoDetectParser();
    parser.setParsers(new HashMap<MediaType, Parser>());
    metadata = new Metadata();
    if(_file != null) {
        metadata.add(TikaMetadataKeys.RESOURCE_NAME_KEY, _file.getName());
    }
    try {
        is = new FileInputStream(_file);
        parser.parse(is, new DefaultHandler(), metadata, new ParseContext());
        mimeType = metadata.get(HttpHeaders.CONTENT_TYPE);
    } catch (IOException e) {
        e.printStackTrace();
    } catch (SAXException e) {
        e.printStackTrace();
    } catch (TikaException e) {
        e.printStackTrace();
    } finally {
        return mimeType;
    }

}

} }

So, how to detect the MIME type even if the file extension is unknown ? 因此,即使文件扩展名未知,如何检测MIME类型?

我认为您无法检测到没有扩展名的mime类型,您需要知道是哪个系统正在写入文件,以及预期将存在哪种文件,并因此需要设置MIME类型(我想您在您的回复中使用它)。

您需要确保内容在发送到Tika之前已解码,并且不需要,绝对不需要扩展,检测是通过此处所述的众所周知的mime魔术过程进行的: https//tika.apache.org/1.1/detection .html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM