简体   繁体   中英

How to extract formula from excel cell in java to identify cmd functions?

Here is the brief description of the problem.

I am working on identifying excel files which have CMD functions such as "=cmd|'/C calc'!A0" for security filtering. We have to currently use Java to parse these files.

I used the following two approaches:

  1. Apache POI. I can parse the excel as a Workbook and get every cell value. The problem I am facing here is the cell which we get is already evaluated and there doesnt seem to be a way to check if the cell starts with "cmd"
  2. Tika. Its similar here. I am able to get the metadata but when trying to use the handler to get the test of the excelt file, its more like !#REF which is not we need.

Does anyone have some suggestions how i can go about this ? It would be really helpful.

Thank you.

I did find an elaborate soln which i used using the above linked stackoverflow.. It handles for both XSSF and HSSF.

            if (workbook instanceof XSSFWorkbook) {
                XSSFWorkbook xssfWorkbook = (XSSFWorkbook) workbook;
                List<ExternalLinksTable> externalLinks = xssfWorkbook.getExternalLinksTable();
                for (ExternalLinksTable linksTable : externalLinks) {
                    if (linksTable.getCTExternalLink().isSetDdeLink()) {
                        return false;
                    }
                }
            } else {
                HSSFWorkbook hssfWorkbook = (HSSFWorkbook) workbook;
                Set<String> references = getWorkbookReferences(hssfWorkbook);
                if (containsStartsWithSubString(references, "cmd")) {
                    return false;
                }
            }


    private Set<String> getWorkbookReferences (HSSFWorkbook wb)
    {
        Set<String> references = new HashSet<>();
        InternalWorkbook internalWorkbook = wb.getInternalWorkbook();
        int extSheetIdx = 0;
        while (internalWorkbook.getExternalSheet(extSheetIdx) != null) {
            EvaluationWorkbook.ExternalSheet extSheet =
                internalWorkbook.getExternalSheet(extSheetIdx++);
            references.add(extSheet.getWorkbookName());

            // fail safe.
            if (extSheetIdx > maxExterLinks) {
                return references;
            }
        }

        return references;
    }

Any suggestions are welcome!

I am still working on how to identify for Word documents, unfortunately :)

There is no option to get the complete string of the cell if there is a formula in the cell as below

SUM(1+1)*cmd|' /C calc'!A0

When I use myCell.getCellFormula() the result is SUM(1+1)*A1 which is not an expected one.

I wanted to block the =cmd| or cmd| if found in a particular cell in the sheet.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM