简体   繁体   中英

How to get Sub-type MIME of an Office document, instead of getting OOXML in Tika

I am using Tika to validate filetypes and make sure no one is trying to send a malicious or fake file under the guise of a genuine one. To do this I am using Apache Tika. However, even if I wrap the InputStream into a TikaInputStream, or using OOXMLParser or OfficeParser, it still returns application/x-tika-ooxml instead of application/vnd.openxmlformats-officedocument.wordprocessingml.document. How do I access or get it to return the sub Type?

    public static boolean isValidFileMimeType(TikaInputStream stream, String[] validMimes) {
    Tika tika = new Tika();
    try {
        Metadata meta = new Metadata();
        tika.detect(stream, meta);
        String mimetype = meta.get("Content-Type");
        logger.debug("MIME type from TIKA is : [" + mimetype +"]");
        logger.debug(meta.toString());
        //return isValidFileMimeType(mimetype, validMimes);
        return true;
    } catch (Exception e) {
        logger.error("Error validating InputStream: ", e);
        return false;
    }

public static boolean isValidFileMimeType(MultipartFile file, String[] mimeTypes) {
    TikaInputStream in = null;
    boolean isValidFile = false;
     try {
         in = TikaInputStream.get(file.getInputStream());
        isValidFile = DataValidator.isValidFileMimeType(in, mimeTypes);
    } catch (IOException e) {
        logger.error("Error while validating file mime type: ", e);
    } finally {
        if (in != null) {
            try {
                in.close();
            } catch (IOException e2) {
                logger.error("Error while closing InputStream: ", e2);
            }
        }
    }
        return isValidFile;
}

只需导入/使用 Tika 解析器

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM