简体   繁体   English

获取智能Excel表与Apache个POI

[英]Get smart Excel table with Apache POI

I'm using code below to convert.csv file into.xlsx.我正在使用下面的代码将 .csv 文件转换为 .xlsx。 It works okay, but customer want to get "smart Excel table" (with filters etc, format as table in Microsoft Excel).它工作正常,但客户想要获得“智能 Excel 表”(带有过滤器等,格式为 Microsoft Excel 中的表)。

Using Apache NiFI and Groovy:使用 Apache NiFI 和 Groovy:

@Grab("org.apache.poi:poi:3.16")
@Grab("org.apache.poi:poi-ooxml:3.16")
import org.apache.poi.xssf.usermodel.*;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.streaming.*;
import org.apache.commons.io.IOUtils
import java.nio.charset.StandardCharsets

def flowFile = session.get()
if(!flowFile)
   return

flowFile = session.write(flowFile, { inputStream, outputStream ->
try {
        SXSSFWorkbook workBook = new SXSSFWorkbook();
        workBook.setCompressTempFiles(true);

        SXSSFSheet sheet = workBook.createSheet("Sheet");
        sheet.setRandomAccessWindowSize(1000);

        String currentLine = null;
        int RowNum = 0;
        BufferedReader br = new BufferedReader(new InputStreamReader(inputStream));
        while ((currentLine = br.readLine()) != null) {
            String[] str = currentLine.split(",");            

            Row currentRow = sheet.createRow(RowNum);
            for(int i=0;i<str.length;i++){
                currentRow.createCell(i).setCellValue(str[i]);
            }
            RowNum++;

            if (RowNum % 1000 == 0) {
                println RowNum;
            }
        }
        workBook.write(outputStream);
        fileOutputStream.close();
    } catch (Exception ex) {
        ex.printStackTrace();
    }


} as StreamCallback)

session.transfer(flowFile, REL_SUCCESS)

My version looks like:我的版本看起来像: 在此处输入图像描述

Customer want to get smth like this:客户想要得到这样的东西:

在此处输入图像描述

How can I achieve this?我怎样才能做到这一点?

At first: Reading CSV files row by row as plain text files and the split by delimiter is error prone.一开始:把CSV个文件当成纯文本文件逐行读取,用分隔符分割容易出错。 There are CSV rules which are not considered by this approach.此方法未考虑 CSV 条规则。 For example: Values might be surrounded by quotes.例如:值可能被引号括起来。 There might be spaces between the comma and the next value, which then not should be part of the value.逗号和下一个值之间可能有空格,那么它不应该是值的一部分。 And so on.等等。 CSV files should be read using libraries which were made for that. CSV 文件应该使用为此制作的库来读取。 There is opencsv for example.例如有opencsv

Creating tables in Excel is part of Apache POI.在 Excel 中创建表是 Apache POI 的一部分。 There is XSSFSheet.createTable .XSSFSheet.createTable Unfortunately there is no SXSSFSheet.createTable .不幸的是没有SXSSFSheet.createTable And you need the streaming version SXSSF because of the size of your CSVs, right?由于 CSV 的大小,您需要流媒体版本SXSSF ,对吗?

To overcome this problem, one can get the underlying XSSFWorkbook out of the SXSSFWorkbook to crate the XSSFTable there.为了克服这个问题,可以从SXSSFWorkbook中获取底层XSSFWorkbook以在那里XSSFTable Problem with this approach is, that while streaming into the SXSSFSheet the underlying XSSFSheet does not contain any data.这种方法的问题是,当流式传输到SXSSFSheet时,底层XSSFSheet不包含任何数据。 That's why XSSFSheet.createTable(AreaReference) does not find any column names in first row of the AreaReference and creates a table having column names "Column1", "Column2", "Column3", .... However, these do not match the actual content of the sheet.这就是为什么XSSFSheet.createTable(AreaReference)AreaReference的第一行中找不到任何列名称并创建一个列名称为“Column1”、“Column2”、“Column3”...的表。但是,这些与表的实际内容。 That's why we need updating the headers after table was created.这就是为什么我们需要在创建表后更新表头。

Complete example:完整示例:

import java.io.*;

import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.usermodel.*;
import org.apache.poi.xssf.streaming.*;
import org.apache.poi.ss.SpreadsheetVersion;
import org.apache.poi.ss.util.AreaReference;
import org.apache.poi.ss.util.CellReference;

import com.opencsv.CSVReader;

class CreateTableFromCSV {
    
 static XSSFTable createTable(SXSSFSheet sxssfSheet, AreaReference areaReference, String[] strHeaders) {
  XSSFWorkbook xssfWorkbook = sxssfSheet.getWorkbook().getXSSFWorkbook();
  XSSFSheet xssfSheet = xssfWorkbook.getSheet(sxssfSheet.getSheetName());
  XSSFTable xssfTable = xssfSheet.createTable(areaReference);
  System.out.println(xssfTable.getCTTable()); // wrong column names since xssfSheet does not contain any data until now
  //xssfTable.updateHeaders(); // this cannot work since xssfSheet does not contain any data until now
  for (int i = 0; i < strHeaders.length; i++) {
   String columnHeader = strHeaders[i];
   if (xssfTable.getCTTable().getTableColumns().getTableColumnList().size() > i) xssfTable.getCTTable().getTableColumns().getTableColumnList().get(i).setName(columnHeader); 
  }
  System.out.println(xssfTable.getCTTable()); // headers updated
  return xssfTable;
 }
    
 public static void main(String[] args) throws Exception {

  try (
   SXSSFWorkbook sxssfWorkbook = new SXSSFWorkbook(); FileOutputStream fileout = new FileOutputStream("./Excel.xlsx");
   CSVReader csvReader = new CSVReader(new FileReader("./cities.csv"));
   //CSVReader csvReader = new CSVReader(new FileReader("./annual-enterprise-survey-2021-financial-year-provisional-csv.csv"));
   //CSVReader csvReader = new CSVReader(new FileReader("./overseas-trade-indexes-September-2022-quarter-provisional-csv.csv"));
   ) {

   sxssfWorkbook.setCompressTempFiles(true);
   SXSSFSheet sxssfSheet = sxssfWorkbook.createSheet("Sheet");
   sxssfSheet.setRandomAccessWindowSize(100);
   
   String[] strHeaders = null;
   String[] dataRow = null;
   int rowNum = 0;
   while ((dataRow = csvReader.readNext()) != null) {
    if (rowNum == 0) strHeaders = dataRow;
    Row currentRow = sxssfSheet.createRow(rowNum);
    for (int i = 0; i < dataRow.length; i++) {
     String cellValue = dataRow[i];
     currentRow.createCell(i).setCellValue(cellValue);
    }
    rowNum++;
    if (rowNum % 1000 == 0) {
     System.out.println(rowNum);
    }
   }

   int lastRow = rowNum -1;
   int lastCol = strHeaders.length -1;
   AreaReference areaReference = new AreaReference(new CellReference(0, 0), new CellReference(lastRow, lastCol), SpreadsheetVersion.EXCEL2007);
   System.out.println(areaReference);
   XSSFTable xssfTable = createTable(sxssfSheet, areaReference, strHeaders);
   //this styles the table as Excel would do per default
   xssfTable.getCTTable().addNewTableStyleInfo();
   XSSFTableStyleInfo style = (XSSFTableStyleInfo)xssfTable.getStyle();
   style.setName("TableStyleLight13");
   style.setShowColumnStripes(false);
   style.setShowRowStripes(true);
   xssfTable.getCTTable().addNewAutoFilter().setRef(areaReference.formatAsString());
   
   sxssfWorkbook.write(fileout);
   sxssfWorkbook.dispose(); 
  }
 }
}

This code is tested and works using current Apache POI version apache poi 5.2.3 .此代码已使用当前 Apache POI 版本apache poi 5.2.3进行测试和工作。

The content of the cities.csv is here as text: cities.csv的内容在这里作为文本:

"LatD", "LatM", "LatS", "NS", "LonD", "LonM", "LonS", "EW", "City", "State"
   41,    5,   59, "N",     80,   39,    0, "W", "Youngstown", OH
   42,   52,   48, "N",     97,   23,   23, "W", "Yankton", SD
   46,   35,   59, "N",    120,   30,   36, "W", "Yakima", WA
   42,   16,   12, "N",     71,   48,    0, "W", "Worcester", MA
   43,   37,   48, "N",     89,   46,   11, "W", "Wisconsin Dells", WI
   36,    5,   59, "N",     80,   15,    0, "W", "Winston-Salem", NC
   49,   52,   48, "N",     97,    9,    0, "W", "Winnipeg", MB
   39,   11,   23, "N",     78,    9,   36, "W", "Winchester", VA
   34,   14,   24, "N",     77,   55,   11, "W", "Wilmington", NC
   39,   45,    0, "N",     75,   33,    0, "W", "Wilmington", DE
   48,    9,    0, "N",    103,   37,   12, "W", "Williston", ND
   41,   15,    0, "N",     77,    0,    0, "W", "Williamsport", PA
   37,   40,   48, "N",     82,   16,   47, "W", "Williamson", WV
   33,   54,    0, "N",     98,   29,   23, "W", "Wichita Falls", TX
   37,   41,   23, "N",     97,   20,   23, "W", "Wichita", KS
   40,    4,   11, "N",     80,   43,   12, "W", "Wheeling", WV
   26,   43,   11, "N",     80,    3,    0, "W", "West Palm Beach", FL
   47,   25,   11, "N",    120,   19,   11, "W", "Wenatchee", WA
   41,   25,   11, "N",    122,   23,   23, "W", "Weed", CA
   31,   13,   11, "N",     82,   20,   59, "W", "Waycross", GA
   44,   57,   35, "N",     89,   38,   23, "W", "Wausau", WI
   42,   21,   36, "N",     87,   49,   48, "W", "Waukegan", IL
   44,   54,    0, "N",     97,    6,   36, "W", "Watertown", SD
   43,   58,   47, "N",     75,   55,   11, "W", "Watertown", NY
   42,   30,    0, "N",     92,   20,   23, "W", "Waterloo", IA
   41,   32,   59, "N",     73,    3,    0, "W", "Waterbury", CT
   38,   53,   23, "N",     77,    1,   47, "W", "Washington", DC
   41,   50,   59, "N",     79,    8,   23, "W", "Warren", PA
   46,    4,   11, "N",    118,   19,   48, "W", "Walla Walla", WA
   31,   32,   59, "N",     97,    8,   23, "W", "Waco", TX
   38,   40,   48, "N",     87,   31,   47, "W", "Vincennes", IN
   28,   48,   35, "N",     97,    0,   36, "W", "Victoria", TX
   32,   20,   59, "N",     90,   52,   47, "W", "Vicksburg", MS
   49,   16,   12, "N",    123,    7,   12, "W", "Vancouver", BC
   46,   55,   11, "N",     98,    0,   36, "W", "Valley City", ND
   30,   49,   47, "N",     83,   16,   47, "W", "Valdosta", GA
   43,    6,   36, "N",     75,   13,   48, "W", "Utica", NY
   39,   54,    0, "N",     79,   43,   48, "W", "Uniontown", PA
   32,   20,   59, "N",     95,   18,    0, "W", "Tyler", TX
   42,   33,   36, "N",    114,   28,   12, "W", "Twin Falls", ID
   33,   12,   35, "N",     87,   34,   11, "W", "Tuscaloosa", AL
   34,   15,   35, "N",     88,   42,   35, "W", "Tupelo", MS
   36,    9,   35, "N",     95,   54,   36, "W", "Tulsa", OK
   32,   13,   12, "N",    110,   58,   12, "W", "Tucson", AZ
   37,   10,   11, "N",    104,   30,   36, "W", "Trinidad", CO
   40,   13,   47, "N",     74,   46,   11, "W", "Trenton", NJ
   44,   45,   35, "N",     85,   37,   47, "W", "Traverse City", MI
   43,   39,    0, "N",     79,   22,   47, "W", "Toronto", ON
   39,    2,   59, "N",     95,   40,   11, "W", "Topeka", KS
   41,   39,    0, "N",     83,   32,   24, "W", "Toledo", OH
   33,   25,   48, "N",     94,    3,    0, "W", "Texarkana", TX
   39,   28,   12, "N",     87,   24,   36, "W", "Terre Haute", IN
   27,   57,    0, "N",     82,   26,   59, "W", "Tampa", FL
   30,   27,    0, "N",     84,   16,   47, "W", "Tallahassee", FL
   47,   14,   24, "N",    122,   25,   48, "W", "Tacoma", WA
   43,    2,   59, "N",     76,    9,    0, "W", "Syracuse", NY
   32,   35,   59, "N",     82,   20,   23, "W", "Swainsboro", GA
   33,   55,   11, "N",     80,   20,   59, "W", "Sumter", SC
   40,   59,   24, "N",     75,   11,   24, "W", "Stroudsburg", PA
   37,   57,   35, "N",    121,   17,   24, "W", "Stockton", CA
   44,   31,   12, "N",     89,   34,   11, "W", "Stevens Point", WI
   40,   21,   36, "N",     80,   37,   12, "W", "Steubenville", OH
   40,   37,   11, "N",    103,   13,   12, "W", "Sterling", CO
   38,    9,    0, "N",     79,    4,   11, "W", "Staunton", VA
   39,   55,   11, "N",     83,   48,   35, "W", "Springfield", OH
   37,   13,   12, "N",     93,   17,   24, "W", "Springfield", MO
   42,    5,   59, "N",     72,   35,   23, "W", "Springfield", MA
   39,   47,   59, "N",     89,   39,    0, "W", "Springfield", IL
   47,   40,   11, "N",    117,   24,   36, "W", "Spokane", WA
   41,   40,   48, "N",     86,   15,    0, "W", "South Bend", IN
   43,   32,   24, "N",     96,   43,   48, "W", "Sioux Falls", SD
   42,   29,   24, "N",     96,   23,   23, "W", "Sioux City", IA
   32,   30,   35, "N",     93,   45,    0, "W", "Shreveport", LA
   33,   38,   23, "N",     96,   36,   36, "W", "Sherman", TX
   44,   47,   59, "N",    106,   57,   35, "W", "Sheridan", WY
   35,   13,   47, "N",     96,   40,   48, "W", "Seminole", OK
   32,   25,   11, "N",     87,    1,   11, "W", "Selma", AL
   38,   42,   35, "N",     93,   13,   48, "W", "Sedalia", MO
   47,   35,   59, "N",    122,   19,   48, "W", "Seattle", WA
   41,   24,   35, "N",     75,   40,   11, "W", "Scranton", PA
   41,   52,   11, "N",    103,   39,   36, "W", "Scottsbluff", NB
   42,   49,   11, "N",     73,   56,   59, "W", "Schenectady", NY
   32,    4,   48, "N",     81,    5,   23, "W", "Savannah", GA
   46,   29,   24, "N",     84,   20,   59, "W", "Sault Sainte Marie", MI
   27,   20,   24, "N",     82,   31,   47, "W", "Sarasota", FL
   38,   26,   23, "N",    122,   43,   12, "W", "Santa Rosa", CA
   35,   40,   48, "N",    105,   56,   59, "W", "Santa Fe", NM
   34,   25,   11, "N",    119,   41,   59, "W", "Santa Barbara", CA
   33,   45,   35, "N",    117,   52,   12, "W", "Santa Ana", CA
   37,   20,   24, "N",    121,   52,   47, "W", "San Jose", CA
   37,   46,   47, "N",    122,   25,   11, "W", "San Francisco", CA
   41,   27,    0, "N",     82,   42,   35, "W", "Sandusky", OH
   32,   42,   35, "N",    117,    9,    0, "W", "San Diego", CA
   34,    6,   36, "N",    117,   18,   35, "W", "San Bernardino", CA
   29,   25,   12, "N",     98,   30,    0, "W", "San Antonio", TX
   31,   27,   35, "N",    100,   26,   24, "W", "San Angelo", TX
   40,   45,   35, "N",    111,   52,   47, "W", "Salt Lake City", UT
   38,   22,   11, "N",     75,   35,   59, "W", "Salisbury", MD
   36,   40,   11, "N",    121,   39,    0, "W", "Salinas", CA
   38,   50,   24, "N",     97,   36,   36, "W", "Salina", KS
   38,   31,   47, "N",    106,    0,    0, "W", "Salida", CO
   44,   56,   23, "N",    123,    1,   47, "W", "Salem", OR
   44,   57,    0, "N",     93,    5,   59, "W", "Saint Paul", MN
   38,   37,   11, "N",     90,   11,   24, "W", "Saint Louis", MO
   39,   46,   12, "N",     94,   50,   23, "W", "Saint Joseph", MO
   42,    5,   59, "N",     86,   28,   48, "W", "Saint Joseph", MI
   44,   25,   11, "N",     72,    1,   11, "W", "Saint Johnsbury", VT
   45,   34,   11, "N",     94,   10,   11, "W", "Saint Cloud", MN
   29,   53,   23, "N",     81,   19,   11, "W", "Saint Augustine", FL
   43,   25,   48, "N",     83,   56,   24, "W", "Saginaw", MI
   38,   35,   24, "N",    121,   29,   23, "W", "Sacramento", CA
   43,   36,   36, "N",     72,   58,   12, "W", "Rutland", VT
   33,   24,    0, "N",    104,   31,   47, "W", "Roswell", NM
   35,   56,   23, "N",     77,   48,    0, "W", "Rocky Mount", NC
   41,   35,   24, "N",    109,   13,   48, "W", "Rock Springs", WY
   42,   16,   12, "N",     89,    5,   59, "W", "Rockford", IL
   43,    9,   35, "N",     77,   36,   36, "W", "Rochester", NY
   44,    1,   12, "N",     92,   27,   35, "W", "Rochester", MN
   37,   16,   12, "N",     79,   56,   24, "W", "Roanoke", VA
   37,   32,   24, "N",     77,   26,   59, "W", "Richmond", VA
   39,   49,   48, "N",     84,   53,   23, "W", "Richmond", IN
   38,   46,   12, "N",    112,    5,   23, "W", "Richfield", UT
   45,   38,   23, "N",     89,   25,   11, "W", "Rhinelander", WI
   39,   31,   12, "N",    119,   48,   35, "W", "Reno", NV
   50,   25,   11, "N",    104,   39,    0, "W", "Regina", SA
   40,   10,   48, "N",    122,   14,   23, "W", "Red Bluff", CA
   40,   19,   48, "N",     75,   55,   48, "W", "Reading", PA
   41,    9,   35, "N",     81,   14,   23, "W", "Ravenna", OH 

Copy/paste in text editor.在文本编辑器中复制/粘贴。 Then save as cities.csv .然后另存为cities.csv

Additional CSV files to test you might download from here: https://www.stats.govt.nz/large-datasets/csv-files-for-download/ .您可以从此处下载其他 CSV 文件进行测试: https://www.stats.govt.nz/large-datasets/csv-files-for-download/

Another problem is using Cell.setCellValue always using String values while Excel differs between string and numeric cell values.另一个问题是使用Cell.setCellValue总是使用字符串值,而 Excel 在字符串和数字单元格值之间存在差异。 But this is a well known problem using CSV. One would need an additional definition file which shows which CSV column is of which data type.但这是使用 CSV 的一个众所周知的问题。需要一个额外的定义文件来显示哪个 CSV 列是哪种数据类型。

You can look at these methods for reference:您可以看看这些方法以供参考:

sheet.setAutoFilter()

row.setHeightInPoints()

For the colour you may have to use a cell style.对于颜色,您可能必须使用单元格样式。

CellStyle.setFillBackgroundColor()

All of these are documented, perhaps go through this as a start.所有这些都记录在案,也许 go 以此作为开始。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM