简体   繁体   中英

Get smart Excel table with Apache POI

I'm using code below to convert.csv file into.xlsx. It works okay, but customer want to get "smart Excel table" (with filters etc, format as table in Microsoft Excel).

Using Apache NiFI and Groovy:

@Grab("org.apache.poi:poi:3.16")
@Grab("org.apache.poi:poi-ooxml:3.16")
import org.apache.poi.xssf.usermodel.*;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.streaming.*;
import org.apache.commons.io.IOUtils
import java.nio.charset.StandardCharsets

def flowFile = session.get()
if(!flowFile)
   return

flowFile = session.write(flowFile, { inputStream, outputStream ->
try {
        SXSSFWorkbook workBook = new SXSSFWorkbook();
        workBook.setCompressTempFiles(true);

        SXSSFSheet sheet = workBook.createSheet("Sheet");
        sheet.setRandomAccessWindowSize(1000);

        String currentLine = null;
        int RowNum = 0;
        BufferedReader br = new BufferedReader(new InputStreamReader(inputStream));
        while ((currentLine = br.readLine()) != null) {
            String[] str = currentLine.split(",");            

            Row currentRow = sheet.createRow(RowNum);
            for(int i=0;i<str.length;i++){
                currentRow.createCell(i).setCellValue(str[i]);
            }
            RowNum++;

            if (RowNum % 1000 == 0) {
                println RowNum;
            }
        }
        workBook.write(outputStream);
        fileOutputStream.close();
    } catch (Exception ex) {
        ex.printStackTrace();
    }


} as StreamCallback)

session.transfer(flowFile, REL_SUCCESS)

My version looks like: 在此处输入图像描述

Customer want to get smth like this:

在此处输入图像描述

How can I achieve this?

At first: Reading CSV files row by row as plain text files and the split by delimiter is error prone. There are CSV rules which are not considered by this approach. For example: Values might be surrounded by quotes. There might be spaces between the comma and the next value, which then not should be part of the value. And so on. CSV files should be read using libraries which were made for that. There is opencsv for example.

Creating tables in Excel is part of Apache POI. There is XSSFSheet.createTable . Unfortunately there is no SXSSFSheet.createTable . And you need the streaming version SXSSF because of the size of your CSVs, right?

To overcome this problem, one can get the underlying XSSFWorkbook out of the SXSSFWorkbook to crate the XSSFTable there. Problem with this approach is, that while streaming into the SXSSFSheet the underlying XSSFSheet does not contain any data. That's why XSSFSheet.createTable(AreaReference) does not find any column names in first row of the AreaReference and creates a table having column names "Column1", "Column2", "Column3", .... However, these do not match the actual content of the sheet. That's why we need updating the headers after table was created.

Complete example:

import java.io.*;

import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.usermodel.*;
import org.apache.poi.xssf.streaming.*;
import org.apache.poi.ss.SpreadsheetVersion;
import org.apache.poi.ss.util.AreaReference;
import org.apache.poi.ss.util.CellReference;

import com.opencsv.CSVReader;

class CreateTableFromCSV {
    
 static XSSFTable createTable(SXSSFSheet sxssfSheet, AreaReference areaReference, String[] strHeaders) {
  XSSFWorkbook xssfWorkbook = sxssfSheet.getWorkbook().getXSSFWorkbook();
  XSSFSheet xssfSheet = xssfWorkbook.getSheet(sxssfSheet.getSheetName());
  XSSFTable xssfTable = xssfSheet.createTable(areaReference);
  System.out.println(xssfTable.getCTTable()); // wrong column names since xssfSheet does not contain any data until now
  //xssfTable.updateHeaders(); // this cannot work since xssfSheet does not contain any data until now
  for (int i = 0; i < strHeaders.length; i++) {
   String columnHeader = strHeaders[i];
   if (xssfTable.getCTTable().getTableColumns().getTableColumnList().size() > i) xssfTable.getCTTable().getTableColumns().getTableColumnList().get(i).setName(columnHeader); 
  }
  System.out.println(xssfTable.getCTTable()); // headers updated
  return xssfTable;
 }
    
 public static void main(String[] args) throws Exception {

  try (
   SXSSFWorkbook sxssfWorkbook = new SXSSFWorkbook(); FileOutputStream fileout = new FileOutputStream("./Excel.xlsx");
   CSVReader csvReader = new CSVReader(new FileReader("./cities.csv"));
   //CSVReader csvReader = new CSVReader(new FileReader("./annual-enterprise-survey-2021-financial-year-provisional-csv.csv"));
   //CSVReader csvReader = new CSVReader(new FileReader("./overseas-trade-indexes-September-2022-quarter-provisional-csv.csv"));
   ) {

   sxssfWorkbook.setCompressTempFiles(true);
   SXSSFSheet sxssfSheet = sxssfWorkbook.createSheet("Sheet");
   sxssfSheet.setRandomAccessWindowSize(100);
   
   String[] strHeaders = null;
   String[] dataRow = null;
   int rowNum = 0;
   while ((dataRow = csvReader.readNext()) != null) {
    if (rowNum == 0) strHeaders = dataRow;
    Row currentRow = sxssfSheet.createRow(rowNum);
    for (int i = 0; i < dataRow.length; i++) {
     String cellValue = dataRow[i];
     currentRow.createCell(i).setCellValue(cellValue);
    }
    rowNum++;
    if (rowNum % 1000 == 0) {
     System.out.println(rowNum);
    }
   }

   int lastRow = rowNum -1;
   int lastCol = strHeaders.length -1;
   AreaReference areaReference = new AreaReference(new CellReference(0, 0), new CellReference(lastRow, lastCol), SpreadsheetVersion.EXCEL2007);
   System.out.println(areaReference);
   XSSFTable xssfTable = createTable(sxssfSheet, areaReference, strHeaders);
   //this styles the table as Excel would do per default
   xssfTable.getCTTable().addNewTableStyleInfo();
   XSSFTableStyleInfo style = (XSSFTableStyleInfo)xssfTable.getStyle();
   style.setName("TableStyleLight13");
   style.setShowColumnStripes(false);
   style.setShowRowStripes(true);
   xssfTable.getCTTable().addNewAutoFilter().setRef(areaReference.formatAsString());
   
   sxssfWorkbook.write(fileout);
   sxssfWorkbook.dispose(); 
  }
 }
}

This code is tested and works using current Apache POI version apache poi 5.2.3 .

The content of the cities.csv is here as text:

"LatD", "LatM", "LatS", "NS", "LonD", "LonM", "LonS", "EW", "City", "State"
   41,    5,   59, "N",     80,   39,    0, "W", "Youngstown", OH
   42,   52,   48, "N",     97,   23,   23, "W", "Yankton", SD
   46,   35,   59, "N",    120,   30,   36, "W", "Yakima", WA
   42,   16,   12, "N",     71,   48,    0, "W", "Worcester", MA
   43,   37,   48, "N",     89,   46,   11, "W", "Wisconsin Dells", WI
   36,    5,   59, "N",     80,   15,    0, "W", "Winston-Salem", NC
   49,   52,   48, "N",     97,    9,    0, "W", "Winnipeg", MB
   39,   11,   23, "N",     78,    9,   36, "W", "Winchester", VA
   34,   14,   24, "N",     77,   55,   11, "W", "Wilmington", NC
   39,   45,    0, "N",     75,   33,    0, "W", "Wilmington", DE
   48,    9,    0, "N",    103,   37,   12, "W", "Williston", ND
   41,   15,    0, "N",     77,    0,    0, "W", "Williamsport", PA
   37,   40,   48, "N",     82,   16,   47, "W", "Williamson", WV
   33,   54,    0, "N",     98,   29,   23, "W", "Wichita Falls", TX
   37,   41,   23, "N",     97,   20,   23, "W", "Wichita", KS
   40,    4,   11, "N",     80,   43,   12, "W", "Wheeling", WV
   26,   43,   11, "N",     80,    3,    0, "W", "West Palm Beach", FL
   47,   25,   11, "N",    120,   19,   11, "W", "Wenatchee", WA
   41,   25,   11, "N",    122,   23,   23, "W", "Weed", CA
   31,   13,   11, "N",     82,   20,   59, "W", "Waycross", GA
   44,   57,   35, "N",     89,   38,   23, "W", "Wausau", WI
   42,   21,   36, "N",     87,   49,   48, "W", "Waukegan", IL
   44,   54,    0, "N",     97,    6,   36, "W", "Watertown", SD
   43,   58,   47, "N",     75,   55,   11, "W", "Watertown", NY
   42,   30,    0, "N",     92,   20,   23, "W", "Waterloo", IA
   41,   32,   59, "N",     73,    3,    0, "W", "Waterbury", CT
   38,   53,   23, "N",     77,    1,   47, "W", "Washington", DC
   41,   50,   59, "N",     79,    8,   23, "W", "Warren", PA
   46,    4,   11, "N",    118,   19,   48, "W", "Walla Walla", WA
   31,   32,   59, "N",     97,    8,   23, "W", "Waco", TX
   38,   40,   48, "N",     87,   31,   47, "W", "Vincennes", IN
   28,   48,   35, "N",     97,    0,   36, "W", "Victoria", TX
   32,   20,   59, "N",     90,   52,   47, "W", "Vicksburg", MS
   49,   16,   12, "N",    123,    7,   12, "W", "Vancouver", BC
   46,   55,   11, "N",     98,    0,   36, "W", "Valley City", ND
   30,   49,   47, "N",     83,   16,   47, "W", "Valdosta", GA
   43,    6,   36, "N",     75,   13,   48, "W", "Utica", NY
   39,   54,    0, "N",     79,   43,   48, "W", "Uniontown", PA
   32,   20,   59, "N",     95,   18,    0, "W", "Tyler", TX
   42,   33,   36, "N",    114,   28,   12, "W", "Twin Falls", ID
   33,   12,   35, "N",     87,   34,   11, "W", "Tuscaloosa", AL
   34,   15,   35, "N",     88,   42,   35, "W", "Tupelo", MS
   36,    9,   35, "N",     95,   54,   36, "W", "Tulsa", OK
   32,   13,   12, "N",    110,   58,   12, "W", "Tucson", AZ
   37,   10,   11, "N",    104,   30,   36, "W", "Trinidad", CO
   40,   13,   47, "N",     74,   46,   11, "W", "Trenton", NJ
   44,   45,   35, "N",     85,   37,   47, "W", "Traverse City", MI
   43,   39,    0, "N",     79,   22,   47, "W", "Toronto", ON
   39,    2,   59, "N",     95,   40,   11, "W", "Topeka", KS
   41,   39,    0, "N",     83,   32,   24, "W", "Toledo", OH
   33,   25,   48, "N",     94,    3,    0, "W", "Texarkana", TX
   39,   28,   12, "N",     87,   24,   36, "W", "Terre Haute", IN
   27,   57,    0, "N",     82,   26,   59, "W", "Tampa", FL
   30,   27,    0, "N",     84,   16,   47, "W", "Tallahassee", FL
   47,   14,   24, "N",    122,   25,   48, "W", "Tacoma", WA
   43,    2,   59, "N",     76,    9,    0, "W", "Syracuse", NY
   32,   35,   59, "N",     82,   20,   23, "W", "Swainsboro", GA
   33,   55,   11, "N",     80,   20,   59, "W", "Sumter", SC
   40,   59,   24, "N",     75,   11,   24, "W", "Stroudsburg", PA
   37,   57,   35, "N",    121,   17,   24, "W", "Stockton", CA
   44,   31,   12, "N",     89,   34,   11, "W", "Stevens Point", WI
   40,   21,   36, "N",     80,   37,   12, "W", "Steubenville", OH
   40,   37,   11, "N",    103,   13,   12, "W", "Sterling", CO
   38,    9,    0, "N",     79,    4,   11, "W", "Staunton", VA
   39,   55,   11, "N",     83,   48,   35, "W", "Springfield", OH
   37,   13,   12, "N",     93,   17,   24, "W", "Springfield", MO
   42,    5,   59, "N",     72,   35,   23, "W", "Springfield", MA
   39,   47,   59, "N",     89,   39,    0, "W", "Springfield", IL
   47,   40,   11, "N",    117,   24,   36, "W", "Spokane", WA
   41,   40,   48, "N",     86,   15,    0, "W", "South Bend", IN
   43,   32,   24, "N",     96,   43,   48, "W", "Sioux Falls", SD
   42,   29,   24, "N",     96,   23,   23, "W", "Sioux City", IA
   32,   30,   35, "N",     93,   45,    0, "W", "Shreveport", LA
   33,   38,   23, "N",     96,   36,   36, "W", "Sherman", TX
   44,   47,   59, "N",    106,   57,   35, "W", "Sheridan", WY
   35,   13,   47, "N",     96,   40,   48, "W", "Seminole", OK
   32,   25,   11, "N",     87,    1,   11, "W", "Selma", AL
   38,   42,   35, "N",     93,   13,   48, "W", "Sedalia", MO
   47,   35,   59, "N",    122,   19,   48, "W", "Seattle", WA
   41,   24,   35, "N",     75,   40,   11, "W", "Scranton", PA
   41,   52,   11, "N",    103,   39,   36, "W", "Scottsbluff", NB
   42,   49,   11, "N",     73,   56,   59, "W", "Schenectady", NY
   32,    4,   48, "N",     81,    5,   23, "W", "Savannah", GA
   46,   29,   24, "N",     84,   20,   59, "W", "Sault Sainte Marie", MI
   27,   20,   24, "N",     82,   31,   47, "W", "Sarasota", FL
   38,   26,   23, "N",    122,   43,   12, "W", "Santa Rosa", CA
   35,   40,   48, "N",    105,   56,   59, "W", "Santa Fe", NM
   34,   25,   11, "N",    119,   41,   59, "W", "Santa Barbara", CA
   33,   45,   35, "N",    117,   52,   12, "W", "Santa Ana", CA
   37,   20,   24, "N",    121,   52,   47, "W", "San Jose", CA
   37,   46,   47, "N",    122,   25,   11, "W", "San Francisco", CA
   41,   27,    0, "N",     82,   42,   35, "W", "Sandusky", OH
   32,   42,   35, "N",    117,    9,    0, "W", "San Diego", CA
   34,    6,   36, "N",    117,   18,   35, "W", "San Bernardino", CA
   29,   25,   12, "N",     98,   30,    0, "W", "San Antonio", TX
   31,   27,   35, "N",    100,   26,   24, "W", "San Angelo", TX
   40,   45,   35, "N",    111,   52,   47, "W", "Salt Lake City", UT
   38,   22,   11, "N",     75,   35,   59, "W", "Salisbury", MD
   36,   40,   11, "N",    121,   39,    0, "W", "Salinas", CA
   38,   50,   24, "N",     97,   36,   36, "W", "Salina", KS
   38,   31,   47, "N",    106,    0,    0, "W", "Salida", CO
   44,   56,   23, "N",    123,    1,   47, "W", "Salem", OR
   44,   57,    0, "N",     93,    5,   59, "W", "Saint Paul", MN
   38,   37,   11, "N",     90,   11,   24, "W", "Saint Louis", MO
   39,   46,   12, "N",     94,   50,   23, "W", "Saint Joseph", MO
   42,    5,   59, "N",     86,   28,   48, "W", "Saint Joseph", MI
   44,   25,   11, "N",     72,    1,   11, "W", "Saint Johnsbury", VT
   45,   34,   11, "N",     94,   10,   11, "W", "Saint Cloud", MN
   29,   53,   23, "N",     81,   19,   11, "W", "Saint Augustine", FL
   43,   25,   48, "N",     83,   56,   24, "W", "Saginaw", MI
   38,   35,   24, "N",    121,   29,   23, "W", "Sacramento", CA
   43,   36,   36, "N",     72,   58,   12, "W", "Rutland", VT
   33,   24,    0, "N",    104,   31,   47, "W", "Roswell", NM
   35,   56,   23, "N",     77,   48,    0, "W", "Rocky Mount", NC
   41,   35,   24, "N",    109,   13,   48, "W", "Rock Springs", WY
   42,   16,   12, "N",     89,    5,   59, "W", "Rockford", IL
   43,    9,   35, "N",     77,   36,   36, "W", "Rochester", NY
   44,    1,   12, "N",     92,   27,   35, "W", "Rochester", MN
   37,   16,   12, "N",     79,   56,   24, "W", "Roanoke", VA
   37,   32,   24, "N",     77,   26,   59, "W", "Richmond", VA
   39,   49,   48, "N",     84,   53,   23, "W", "Richmond", IN
   38,   46,   12, "N",    112,    5,   23, "W", "Richfield", UT
   45,   38,   23, "N",     89,   25,   11, "W", "Rhinelander", WI
   39,   31,   12, "N",    119,   48,   35, "W", "Reno", NV
   50,   25,   11, "N",    104,   39,    0, "W", "Regina", SA
   40,   10,   48, "N",    122,   14,   23, "W", "Red Bluff", CA
   40,   19,   48, "N",     75,   55,   48, "W", "Reading", PA
   41,    9,   35, "N",     81,   14,   23, "W", "Ravenna", OH 

Copy/paste in text editor. Then save as cities.csv .

Additional CSV files to test you might download from here: https://www.stats.govt.nz/large-datasets/csv-files-for-download/ .

Another problem is using Cell.setCellValue always using String values while Excel differs between string and numeric cell values. But this is a well known problem using CSV. One would need an additional definition file which shows which CSV column is of which data type.

You can look at these methods for reference:

sheet.setAutoFilter()

row.setHeightInPoints()

For the colour you may have to use a cell style.

CellStyle.setFillBackgroundColor()

All of these are documented, perhaps go through this as a start.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM