简体   繁体   English

Excel 到文本转换正确处理公式和空单元格

[英]Excel to text conversion properly handle formula and empty cells

I'm trying to convert an excel file into a tab separated text file via Apache POI.我正在尝试通过 Apache POI 将 excel 文件转换为制表符分隔的文本文件。 The excel has some cells formatted with formulas and some empty cells. excel 有一些用公式格式化的单元格和一些空单元格。

Here's a sample of original excel file:这是原始 excel 文件的示例: 在此处输入图像描述

Here's an extract of the final output:这是最终 output 的摘录:

'US'    'USORACLEAP'    SYSTEMREFERENCE SUPPLIERID  SUPPLIERNAME    CLASSIFICATION  VENDOR_SITE_CODE    SUPPLIERADDRESS1    SUPPLIERADDRESS2    STATE   ZIPCODE COUNTRY SOURCE  INVOICENUM  INVOICEDATE PAYMENTDATE LINE_DESC   GL_COMPANY  GL_CODE GL_DESCR    COSTCENTER  CC_DESCR    CURRENCY_CODE   CHECK_NUMBER    NUM_DOCS    SPEND   TERM    PAYMENT_METHOD  SYSTEM_APPROVED PO_DISTRIBUTION_ID  WALKER_COST_CENTER  RGL_LEDGER_ENTITY   
US  US Oracle AP        RANDBETWEEN(3000,100000)    "TEXT "&D2  VENDOR  "TEXT "&D3  "TEXT "&D3  "TEXT "&D3  ONTARIO RIGHT(D2,5) US  "TEXT "&D3  "TEXT "&D3  RANDBETWEEN(43831, 44150)   RANDBETWEEN(44105,44135)    "TEXT "&D3  RIGHT("000"&RANDBETWEEN(1,999),3)   RANDBETWEEN(55000, 60000)   "TEXT "&D3  "TEXT "&D3  "TEXT "&D3  USD RANDBETWEEN(2000000,2100000)    RANDBETWEEN(1,4)    RANDBETWEEN(1,100000)/100   IMMEDIATE   Check           "TEXT"&D2   X2  
US  US Oracle AP        31836   "TEXT "&D3  1099    "TEXT "&D4  "TEXT "&D4  "TEXT "&D4  NY  RIGHT(D3,5) US  "TEXT "&D4  "TEXT "&D4  RANDBETWEEN(43831,44150)    RANDBETWEEN(44105,44135)    "TEXT "&D4  RIGHT("000"&RANDBETWEEN(1,999),3)   RANDBETWEEN(55000,60000)    "TEXT "&D4  "TEXT "&D4  "TEXT "&D4  USD RANDBETWEEN(2000000,2100000)    RANDBETWEEN(1,4)    RANDBETWEEN(1,100000)/100   IMMEDIATE   Check           GSUEDCM03   AF2 
US  US Oracle AP        3504    "TEXT "&D4  VENDOR  "TEXT "&D5  "TEXT "&D5  "TEXT "&D5  NY  RIGHT(D4,5) US  "TEXT "&D5  "TEXT "&D5  RANDBETWEEN(43831,44150)    RANDBETWEEN(44105,44135)    "TEXT "&D5  RIGHT("000"&RANDBETWEEN(1,999),3)   RANDBETWEEN(55000,60000)    "TEXT "&D5  "TEXT "&D5  "TEXT "&D5  USD RANDBETWEEN(2000000,2100000)    RANDBETWEEN(1,4)    RANDBETWEEN(1,100000)/100   IMMEDIATE   ACH         GSUEIT001   AF3 
US  US Oracle AP        3504    "TEXT "&D5  VENDOR  "TEXT "&D6  "TEXT "&D6  "TEXT "&D6  NY  RIGHT(D5,5) US  "TEXT "&D6  "TEXT "&D6  RANDBETWEEN(43831,44150)    RANDBETWEEN(44105,44135)    "TEXT "&D6  RIGHT("000"&RANDBETWEEN(1,999),3)   RANDBETWEEN(55000,60000)    "TEXT "&D6  "TEXT "&D6  "TEXT "&D6  USD RANDBETWEEN(2000000,2100000)    RANDBETWEEN(1,4)    RANDBETWEEN(1,100000)/100   IMMEDIATE   ACH         GSUEIT001   AF4 
US  US Oracle AP        3504    "TEXT "&D6  VENDOR  "TEXT "&D7  "TEXT "&D7  "TEXT "&D7  NY  RIGHT(D6,5) US  "TEXT "&D7  "TEXT "&D7  RANDBETWEEN(43831,44150)    RANDBETWEEN(44105,44135)    "TEXT "&D7  RIGHT("000"&RANDBETWEEN(1,999),3)   RANDBETWEEN(55000,60000)    "TEXT "&D7  "TEXT "&D7  "TEXT "&D7  USD RANDBETWEEN(2000000,2100000)    RANDBETWEEN(1,4)    RANDBETWEEN(1,100000)/100   IMMEDIATE   ACH         GSUEIT001   AF5 

As you can see, the 1st row represents column headers.如您所见,第一行代表列标题。 Some of the cells ( D1 ) have been converted to the actual formula.一些单元格 ( D1 ) 已转换为实际公式。 The 3rd column doesn't have any values so the whole content shifted towards left in the text file.第三列没有任何值,因此整个内容在文本文件中向左移动。

Here's the code:这是代码:

private void convertXlsToText(InputStream inputStream, String delimiter, File targetFile) throws IOException {
        StringBuilder sb = new StringBuilder();
        setMinInflateRatio(0);
        try (Workbook wb = create(inputStream)) {
            Sheet firstSheet = wb.getSheetAt(0);

            for (Row nextRow : firstSheet) {
                Iterator<Cell> cellIterator = nextRow.cellIterator();
                while (cellIterator.hasNext()) {
                    Cell cell = cellIterator.next();
                    switch (cell.getCellType()) {
                        case STRING:
                            sb.append(cell.getStringCellValue()).append(delimiter);
                            break;
                        case BOOLEAN:
                            sb.append(cell.getBooleanCellValue()).append(delimiter);
                            break;
                        case NUMERIC:
                            sb.append(cell.getNumericCellValue()).append(delimiter);
                            break;
                        case FORMULA:
                            sb.append(cell.getCellFormula()).append(delimiter);
                            break;
                        default:
                            sb.append(EMPTY).append(delimiter);
                    }
                }
                sb.append(DEFAULT_LINE_END);
            }
        }

        dumpStringBuilderToFile(sb, targetFile);
    }

Can someone please point out what changes should i be making in my code to fix the alignments and the formula issue?有人可以指出我应该在我的代码中进行哪些更改以修复对齐和公式问题吗? PS: I'm using TAB (\t) as my delimiter. PS:我使用 TAB (\t)作为分隔符。

UPDATE: Here's the updated code after suggestions.更新:这是建议后的更新代码。

    private void convertXlsToText(InputStream inputStream, String delimiter, File targetFile) throws IOException {
        StringBuilder sb = new StringBuilder();
        setMinInflateRatio(0);
        try (Workbook wb = create(inputStream)) {
            Sheet firstSheet = wb.getSheetAt(0);
            FormulaEvaluator evaluator = wb.getCreationHelper().createFormulaEvaluator();
            DataFormatter formatter = new DataFormatter();
            for (Row nextRow : firstSheet) {
                Iterator<Cell> cellIterator = nextRow.cellIterator();
                while (cellIterator.hasNext()) {
                    Cell cell = cellIterator.next();
                    if (cell != null) {
                        sb.append(format("%-20s", formatter.formatCellValue(cell, evaluator))).append(delimiter);
                    } else {
                        sb.append(format("%-20s", EMPTY)).append(delimiter);
                    }
                }
                sb.append(DEFAULT_LINE_END);
            }
        }

        dumpStringBuilderToFile(sb, targetFile);
    }

To get the value from the formula field and not the formula itself check the below implementation:要从公式字段而不是公式本身获取值,请检查以下实现:

    FormulaEvaluator evaluator = myWorkbook.getCreationHelper().createFormulaEvaluator();

    CellValue cellValue = evaluator.evaluate(cell); // where **cell** is your formula cell

    switch (cellValue.getCellType()) {
        case Cell.CELL_TYPE_BOOLEAN:
            System.out.println(cellValue.getBooleanValue());
            break;
        case Cell.CELL_TYPE_NUMERIC:
            System.out.println(cellValue.getNumberValue());
            break;
        case Cell.CELL_TYPE_STRING:
            System.out.println(cellValue.getStringValue());
            break;
        case Cell.CELL_TYPE_BLANK:
            break;
        case Cell.CELL_TYPE_ERROR:
            break;
    }               
    }

EDIT:编辑:

Regarding the alignment issue, check this: How can I pad a String in Java?关于 alignment 问题,请检查: 如何在 Java 中填充字符串?

If the requirement is writing Excel data into a text file, then all cell values needs to be get as String .如果要求是将Excel数据写入文本文件,则所有单元格值都需要以String形式获取。 A convenient way to do so is using DataFormatter of apache poi .一种方便的方法是使用apache poiDataFormatter Using DataFormatter you will get cell values as they are shown in Excel sheets.使用DataFormatter ,您将获得单元格值,如Excel表中所示。 Eg having number formats and date formats.例如具有数字格式和日期格式。 And if you are using DataFormatter together with a FormulaEvaluator then formulas get evaluated and evaluated values are converted to String .如果您将DataFormatterFormulaEvaluator一起使用,则计算公式并将计算值转换为String

To avoid missing empty cells one needs to get the cells count first, because the cell iterator will skip empty cells.为避免丢失空单元格,需要首先获取单元格计数,因为单元格迭代器将跳过空单元格。 For example the cells count from the header row will be the cells cont for each further row also.例如,来自 header 行的单元格计数也将是每个后续行的单元格。

So the whole code would be as simple as this:所以整个代码就像这样简单:

import org.apache.poi.ss.usermodel.*;
import java.io.*;

class ExcelToText {
 static final String DEFAULT_LINE_END = System.getProperty("line.separator");

 static void convertXlsToText(InputStream inputStream, String delimiter, OutputStream outputStream) throws Exception {
  StringBuilder sb = new StringBuilder();
  Workbook workbook = WorkbookFactory.create(inputStream);
  DataFormatter dataFormatter = new DataFormatter(java.util.Locale.US);
  FormulaEvaluator formulaEvaluator = workbook.getCreationHelper().createFormulaEvaluator();
  String cellValue = "";
  Sheet sheet = workbook.getSheetAt(0);
  Row headerRow = sheet.getRow(0);
  int cellCount = 0;
  if (headerRow != null) {
   cellCount = headerRow.getLastCellNum();
  }
  if (cellCount > 0) {
   for (Row row : sheet) {
    for (int c = 0; c < cellCount; c++) {
     Cell cell = row.getCell(c, Row.MissingCellPolicy.CREATE_NULL_AS_BLANK);
     cellValue = dataFormatter.formatCellValue(cell, formulaEvaluator);
     sb.append(cellValue);
     if (c < cellCount-1) sb.append(delimiter);
    }
    sb.append(DEFAULT_LINE_END);
   }
  }
  workbook.close();
  BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(outputStream, java.nio.charset.StandardCharsets.UTF_8));
  bw.append(sb);
  bw.flush();
  bw.close();
 }

 public static void main(String[] args) throws Exception {
  convertXlsToText(new FileInputStream("./Excel.xlsx"), "\t", new FileOutputStream("./Data.txt"));
 }
}

No CellType checking and extra formula evaluating needed.不需要CellType检查和额外的公式评估。

To your other requrement: A delimeted text file should only contain real content delimited with the delimiter.对于您的其他要求:带分隔符的文本文件应该只包含用分隔符分隔的真实内容。 There should not be content manipulation.不应该有内容操纵。 So prepending spaces to the content or filling up with spaces up to a special width is not a good idea in my opinion.因此,在我看来,在内容前添加空格或填充特殊宽度的空格并不是一个好主意。 If you have tabulator as the delimiter for example, then only tabulator positions set in the text viewer should affect the view.例如,如果您将制表符作为分隔符,则只有在文本查看器中设置的制表符位置才会影响视图。 Supplementary added spaces will only disturb.补充添加的空间只会打扰。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM