简体   繁体   English

如何从java中的excel单元格中提取公式来识别cmd函数?

[英]How to extract formula from excel cell in java to identify cmd functions?

Here is the brief description of the problem.以下是问题的简要说明。

I am working on identifying excel files which have CMD functions such as "=cmd|'/C calc'!A0" for security filtering.我正在识别具有 CMD 功能的 excel 文件,例如用于安全过滤的“=cmd|'/C calc'!A0”。 We have to currently use Java to parse these files.我们目前必须使用 Java 来解析这些文件。

I used the following two approaches:我使用了以下两种方法:

  1. Apache POI.阿帕奇兴趣点。 I can parse the excel as a Workbook and get every cell value.我可以将 excel 解析为工作簿并获取每个单元格值。 The problem I am facing here is the cell which we get is already evaluated and there doesnt seem to be a way to check if the cell starts with "cmd"我在这里面临的问题是我们得到的单元格已经过评估,似乎没有办法检查单元格是否以“cmd”开头
  2. Tika.蒂卡。 Its similar here.它类似于这里。 I am able to get the metadata but when trying to use the handler to get the test of the excelt file, its more like !#REF which is not we need.我能够获取元数据,但是当尝试使用处理程序获取 excelt 文件的测试时,它更像是 !#REF,这不是我们需要的。

Does anyone have some suggestions how i can go about this ?有没有人对我如何解决这个问题有一些建议? It would be really helpful.这真的很有帮助。

Thank you.谢谢你。

I did find an elaborate soln which i used using the above linked stackoverflow.. It handles for both XSSF and HSSF.我确实找到了一个精心设计的解决方案,我使用上面链接的 stackoverflow .. 它处理 XSSF 和 HSSF。

            if (workbook instanceof XSSFWorkbook) {
                XSSFWorkbook xssfWorkbook = (XSSFWorkbook) workbook;
                List<ExternalLinksTable> externalLinks = xssfWorkbook.getExternalLinksTable();
                for (ExternalLinksTable linksTable : externalLinks) {
                    if (linksTable.getCTExternalLink().isSetDdeLink()) {
                        return false;
                    }
                }
            } else {
                HSSFWorkbook hssfWorkbook = (HSSFWorkbook) workbook;
                Set<String> references = getWorkbookReferences(hssfWorkbook);
                if (containsStartsWithSubString(references, "cmd")) {
                    return false;
                }
            }


    private Set<String> getWorkbookReferences (HSSFWorkbook wb)
    {
        Set<String> references = new HashSet<>();
        InternalWorkbook internalWorkbook = wb.getInternalWorkbook();
        int extSheetIdx = 0;
        while (internalWorkbook.getExternalSheet(extSheetIdx) != null) {
            EvaluationWorkbook.ExternalSheet extSheet =
                internalWorkbook.getExternalSheet(extSheetIdx++);
            references.add(extSheet.getWorkbookName());

            // fail safe.
            if (extSheetIdx > maxExterLinks) {
                return references;
            }
        }

        return references;
    }

Any suggestions are welcome!欢迎任何建议!

I am still working on how to identify for Word documents, unfortunately :)不幸的是,我仍在研究如何识别 Word 文档:)

There is no option to get the complete string of the cell if there is a formula in the cell as below如果单元格中有如下公式,则无法获取单元格的完整字符串

SUM(1+1)*cmd|' /C calc'!A0

When I use myCell.getCellFormula() the result is SUM(1+1)*A1 which is not an expected one.当我使用myCell.getCellFormula() ,结果是SUM(1+1)*A1这不是预期的结果。

I wanted to block the =cmd|我想阻止 =cmd| or cmd|或 cmd| if found in a particular cell in the sheet.如果在工作表的特定单元格中找到。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM