[英]Get smart Excel table with Apache POI
I'm using code below to convert.csv file into.xlsx.我正在使用下面的代码将 .csv 文件转换为 .xlsx。 It works okay, but customer want to get "smart Excel table" (with filters etc, format as table in Microsoft Excel).
它工作正常,但客户想要获得“智能 Excel 表”(带有过滤器等,格式为 Microsoft Excel 中的表)。
Using Apache NiFI and Groovy:使用 Apache NiFI 和 Groovy:
@Grab("org.apache.poi:poi:3.16")
@Grab("org.apache.poi:poi-ooxml:3.16")
import org.apache.poi.xssf.usermodel.*;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.streaming.*;
import org.apache.commons.io.IOUtils
import java.nio.charset.StandardCharsets
def flowFile = session.get()
if(!flowFile)
return
flowFile = session.write(flowFile, { inputStream, outputStream ->
try {
SXSSFWorkbook workBook = new SXSSFWorkbook();
workBook.setCompressTempFiles(true);
SXSSFSheet sheet = workBook.createSheet("Sheet");
sheet.setRandomAccessWindowSize(1000);
String currentLine = null;
int RowNum = 0;
BufferedReader br = new BufferedReader(new InputStreamReader(inputStream));
while ((currentLine = br.readLine()) != null) {
String[] str = currentLine.split(",");
Row currentRow = sheet.createRow(RowNum);
for(int i=0;i<str.length;i++){
currentRow.createCell(i).setCellValue(str[i]);
}
RowNum++;
if (RowNum % 1000 == 0) {
println RowNum;
}
}
workBook.write(outputStream);
fileOutputStream.close();
} catch (Exception ex) {
ex.printStackTrace();
}
} as StreamCallback)
session.transfer(flowFile, REL_SUCCESS)
My version looks like:我的版本看起来像:
Customer want to get smth like this:客户想要得到这样的东西:
How can I achieve this?我怎样才能做到这一点?
At first: Reading CSV files row by row as plain text files and the split by delimiter is error prone.一开始:把CSV个文件当成纯文本文件逐行读取,用分隔符分割容易出错。 There are CSV rules which are not considered by this approach.
此方法未考虑 CSV 条规则。 For example: Values might be surrounded by quotes.
例如:值可能被引号括起来。 There might be spaces between the comma and the next value, which then not should be part of the value.
逗号和下一个值之间可能有空格,那么它不应该是值的一部分。 And so on.
等等。 CSV files should be read using libraries which were made for that.
CSV 文件应该使用为此制作的库来读取。 There is opencsv for example.
例如有opencsv 。
Creating tables in Excel is part of Apache POI.在 Excel 中创建表是 Apache POI 的一部分。 There is XSSFSheet.createTable .
有XSSFSheet.createTable 。 Unfortunately there is no
SXSSFSheet.createTable
.不幸的是没有
SXSSFSheet.createTable
。 And you need the streaming version SXSSF
because of the size of your CSVs, right?由于 CSV 的大小,您需要流媒体版本
SXSSF
,对吗?
To overcome this problem, one can get the underlying XSSFWorkbook
out of the SXSSFWorkbook
to crate the XSSFTable
there.为了克服这个问题,可以从
SXSSFWorkbook
中获取底层XSSFWorkbook
以在那里XSSFTable
。 Problem with this approach is, that while streaming into the SXSSFSheet
the underlying XSSFSheet
does not contain any data.这种方法的问题是,当流式传输到
SXSSFSheet
时,底层XSSFSheet
不包含任何数据。 That's why XSSFSheet.createTable(AreaReference)
does not find any column names in first row of the AreaReference
and creates a table having column names "Column1", "Column2", "Column3", .... However, these do not match the actual content of the sheet.这就是为什么
XSSFSheet.createTable(AreaReference)
在AreaReference
的第一行中找不到任何列名称并创建一个列名称为“Column1”、“Column2”、“Column3”...的表。但是,这些与表的实际内容。 That's why we need updating the headers after table was created.这就是为什么我们需要在创建表后更新表头。
Complete example:完整示例:
import java.io.*;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.usermodel.*;
import org.apache.poi.xssf.streaming.*;
import org.apache.poi.ss.SpreadsheetVersion;
import org.apache.poi.ss.util.AreaReference;
import org.apache.poi.ss.util.CellReference;
import com.opencsv.CSVReader;
class CreateTableFromCSV {
static XSSFTable createTable(SXSSFSheet sxssfSheet, AreaReference areaReference, String[] strHeaders) {
XSSFWorkbook xssfWorkbook = sxssfSheet.getWorkbook().getXSSFWorkbook();
XSSFSheet xssfSheet = xssfWorkbook.getSheet(sxssfSheet.getSheetName());
XSSFTable xssfTable = xssfSheet.createTable(areaReference);
System.out.println(xssfTable.getCTTable()); // wrong column names since xssfSheet does not contain any data until now
//xssfTable.updateHeaders(); // this cannot work since xssfSheet does not contain any data until now
for (int i = 0; i < strHeaders.length; i++) {
String columnHeader = strHeaders[i];
if (xssfTable.getCTTable().getTableColumns().getTableColumnList().size() > i) xssfTable.getCTTable().getTableColumns().getTableColumnList().get(i).setName(columnHeader);
}
System.out.println(xssfTable.getCTTable()); // headers updated
return xssfTable;
}
public static void main(String[] args) throws Exception {
try (
SXSSFWorkbook sxssfWorkbook = new SXSSFWorkbook(); FileOutputStream fileout = new FileOutputStream("./Excel.xlsx");
CSVReader csvReader = new CSVReader(new FileReader("./cities.csv"));
//CSVReader csvReader = new CSVReader(new FileReader("./annual-enterprise-survey-2021-financial-year-provisional-csv.csv"));
//CSVReader csvReader = new CSVReader(new FileReader("./overseas-trade-indexes-September-2022-quarter-provisional-csv.csv"));
) {
sxssfWorkbook.setCompressTempFiles(true);
SXSSFSheet sxssfSheet = sxssfWorkbook.createSheet("Sheet");
sxssfSheet.setRandomAccessWindowSize(100);
String[] strHeaders = null;
String[] dataRow = null;
int rowNum = 0;
while ((dataRow = csvReader.readNext()) != null) {
if (rowNum == 0) strHeaders = dataRow;
Row currentRow = sxssfSheet.createRow(rowNum);
for (int i = 0; i < dataRow.length; i++) {
String cellValue = dataRow[i];
currentRow.createCell(i).setCellValue(cellValue);
}
rowNum++;
if (rowNum % 1000 == 0) {
System.out.println(rowNum);
}
}
int lastRow = rowNum -1;
int lastCol = strHeaders.length -1;
AreaReference areaReference = new AreaReference(new CellReference(0, 0), new CellReference(lastRow, lastCol), SpreadsheetVersion.EXCEL2007);
System.out.println(areaReference);
XSSFTable xssfTable = createTable(sxssfSheet, areaReference, strHeaders);
//this styles the table as Excel would do per default
xssfTable.getCTTable().addNewTableStyleInfo();
XSSFTableStyleInfo style = (XSSFTableStyleInfo)xssfTable.getStyle();
style.setName("TableStyleLight13");
style.setShowColumnStripes(false);
style.setShowRowStripes(true);
xssfTable.getCTTable().addNewAutoFilter().setRef(areaReference.formatAsString());
sxssfWorkbook.write(fileout);
sxssfWorkbook.dispose();
}
}
}
This code is tested and works using current Apache POI version apache poi 5.2.3
.此代码已使用当前 Apache POI 版本
apache poi 5.2.3
进行测试和工作。
The content of the cities.csv
is here as text: cities.csv
的内容在这里作为文本:
"LatD", "LatM", "LatS", "NS", "LonD", "LonM", "LonS", "EW", "City", "State"
41, 5, 59, "N", 80, 39, 0, "W", "Youngstown", OH
42, 52, 48, "N", 97, 23, 23, "W", "Yankton", SD
46, 35, 59, "N", 120, 30, 36, "W", "Yakima", WA
42, 16, 12, "N", 71, 48, 0, "W", "Worcester", MA
43, 37, 48, "N", 89, 46, 11, "W", "Wisconsin Dells", WI
36, 5, 59, "N", 80, 15, 0, "W", "Winston-Salem", NC
49, 52, 48, "N", 97, 9, 0, "W", "Winnipeg", MB
39, 11, 23, "N", 78, 9, 36, "W", "Winchester", VA
34, 14, 24, "N", 77, 55, 11, "W", "Wilmington", NC
39, 45, 0, "N", 75, 33, 0, "W", "Wilmington", DE
48, 9, 0, "N", 103, 37, 12, "W", "Williston", ND
41, 15, 0, "N", 77, 0, 0, "W", "Williamsport", PA
37, 40, 48, "N", 82, 16, 47, "W", "Williamson", WV
33, 54, 0, "N", 98, 29, 23, "W", "Wichita Falls", TX
37, 41, 23, "N", 97, 20, 23, "W", "Wichita", KS
40, 4, 11, "N", 80, 43, 12, "W", "Wheeling", WV
26, 43, 11, "N", 80, 3, 0, "W", "West Palm Beach", FL
47, 25, 11, "N", 120, 19, 11, "W", "Wenatchee", WA
41, 25, 11, "N", 122, 23, 23, "W", "Weed", CA
31, 13, 11, "N", 82, 20, 59, "W", "Waycross", GA
44, 57, 35, "N", 89, 38, 23, "W", "Wausau", WI
42, 21, 36, "N", 87, 49, 48, "W", "Waukegan", IL
44, 54, 0, "N", 97, 6, 36, "W", "Watertown", SD
43, 58, 47, "N", 75, 55, 11, "W", "Watertown", NY
42, 30, 0, "N", 92, 20, 23, "W", "Waterloo", IA
41, 32, 59, "N", 73, 3, 0, "W", "Waterbury", CT
38, 53, 23, "N", 77, 1, 47, "W", "Washington", DC
41, 50, 59, "N", 79, 8, 23, "W", "Warren", PA
46, 4, 11, "N", 118, 19, 48, "W", "Walla Walla", WA
31, 32, 59, "N", 97, 8, 23, "W", "Waco", TX
38, 40, 48, "N", 87, 31, 47, "W", "Vincennes", IN
28, 48, 35, "N", 97, 0, 36, "W", "Victoria", TX
32, 20, 59, "N", 90, 52, 47, "W", "Vicksburg", MS
49, 16, 12, "N", 123, 7, 12, "W", "Vancouver", BC
46, 55, 11, "N", 98, 0, 36, "W", "Valley City", ND
30, 49, 47, "N", 83, 16, 47, "W", "Valdosta", GA
43, 6, 36, "N", 75, 13, 48, "W", "Utica", NY
39, 54, 0, "N", 79, 43, 48, "W", "Uniontown", PA
32, 20, 59, "N", 95, 18, 0, "W", "Tyler", TX
42, 33, 36, "N", 114, 28, 12, "W", "Twin Falls", ID
33, 12, 35, "N", 87, 34, 11, "W", "Tuscaloosa", AL
34, 15, 35, "N", 88, 42, 35, "W", "Tupelo", MS
36, 9, 35, "N", 95, 54, 36, "W", "Tulsa", OK
32, 13, 12, "N", 110, 58, 12, "W", "Tucson", AZ
37, 10, 11, "N", 104, 30, 36, "W", "Trinidad", CO
40, 13, 47, "N", 74, 46, 11, "W", "Trenton", NJ
44, 45, 35, "N", 85, 37, 47, "W", "Traverse City", MI
43, 39, 0, "N", 79, 22, 47, "W", "Toronto", ON
39, 2, 59, "N", 95, 40, 11, "W", "Topeka", KS
41, 39, 0, "N", 83, 32, 24, "W", "Toledo", OH
33, 25, 48, "N", 94, 3, 0, "W", "Texarkana", TX
39, 28, 12, "N", 87, 24, 36, "W", "Terre Haute", IN
27, 57, 0, "N", 82, 26, 59, "W", "Tampa", FL
30, 27, 0, "N", 84, 16, 47, "W", "Tallahassee", FL
47, 14, 24, "N", 122, 25, 48, "W", "Tacoma", WA
43, 2, 59, "N", 76, 9, 0, "W", "Syracuse", NY
32, 35, 59, "N", 82, 20, 23, "W", "Swainsboro", GA
33, 55, 11, "N", 80, 20, 59, "W", "Sumter", SC
40, 59, 24, "N", 75, 11, 24, "W", "Stroudsburg", PA
37, 57, 35, "N", 121, 17, 24, "W", "Stockton", CA
44, 31, 12, "N", 89, 34, 11, "W", "Stevens Point", WI
40, 21, 36, "N", 80, 37, 12, "W", "Steubenville", OH
40, 37, 11, "N", 103, 13, 12, "W", "Sterling", CO
38, 9, 0, "N", 79, 4, 11, "W", "Staunton", VA
39, 55, 11, "N", 83, 48, 35, "W", "Springfield", OH
37, 13, 12, "N", 93, 17, 24, "W", "Springfield", MO
42, 5, 59, "N", 72, 35, 23, "W", "Springfield", MA
39, 47, 59, "N", 89, 39, 0, "W", "Springfield", IL
47, 40, 11, "N", 117, 24, 36, "W", "Spokane", WA
41, 40, 48, "N", 86, 15, 0, "W", "South Bend", IN
43, 32, 24, "N", 96, 43, 48, "W", "Sioux Falls", SD
42, 29, 24, "N", 96, 23, 23, "W", "Sioux City", IA
32, 30, 35, "N", 93, 45, 0, "W", "Shreveport", LA
33, 38, 23, "N", 96, 36, 36, "W", "Sherman", TX
44, 47, 59, "N", 106, 57, 35, "W", "Sheridan", WY
35, 13, 47, "N", 96, 40, 48, "W", "Seminole", OK
32, 25, 11, "N", 87, 1, 11, "W", "Selma", AL
38, 42, 35, "N", 93, 13, 48, "W", "Sedalia", MO
47, 35, 59, "N", 122, 19, 48, "W", "Seattle", WA
41, 24, 35, "N", 75, 40, 11, "W", "Scranton", PA
41, 52, 11, "N", 103, 39, 36, "W", "Scottsbluff", NB
42, 49, 11, "N", 73, 56, 59, "W", "Schenectady", NY
32, 4, 48, "N", 81, 5, 23, "W", "Savannah", GA
46, 29, 24, "N", 84, 20, 59, "W", "Sault Sainte Marie", MI
27, 20, 24, "N", 82, 31, 47, "W", "Sarasota", FL
38, 26, 23, "N", 122, 43, 12, "W", "Santa Rosa", CA
35, 40, 48, "N", 105, 56, 59, "W", "Santa Fe", NM
34, 25, 11, "N", 119, 41, 59, "W", "Santa Barbara", CA
33, 45, 35, "N", 117, 52, 12, "W", "Santa Ana", CA
37, 20, 24, "N", 121, 52, 47, "W", "San Jose", CA
37, 46, 47, "N", 122, 25, 11, "W", "San Francisco", CA
41, 27, 0, "N", 82, 42, 35, "W", "Sandusky", OH
32, 42, 35, "N", 117, 9, 0, "W", "San Diego", CA
34, 6, 36, "N", 117, 18, 35, "W", "San Bernardino", CA
29, 25, 12, "N", 98, 30, 0, "W", "San Antonio", TX
31, 27, 35, "N", 100, 26, 24, "W", "San Angelo", TX
40, 45, 35, "N", 111, 52, 47, "W", "Salt Lake City", UT
38, 22, 11, "N", 75, 35, 59, "W", "Salisbury", MD
36, 40, 11, "N", 121, 39, 0, "W", "Salinas", CA
38, 50, 24, "N", 97, 36, 36, "W", "Salina", KS
38, 31, 47, "N", 106, 0, 0, "W", "Salida", CO
44, 56, 23, "N", 123, 1, 47, "W", "Salem", OR
44, 57, 0, "N", 93, 5, 59, "W", "Saint Paul", MN
38, 37, 11, "N", 90, 11, 24, "W", "Saint Louis", MO
39, 46, 12, "N", 94, 50, 23, "W", "Saint Joseph", MO
42, 5, 59, "N", 86, 28, 48, "W", "Saint Joseph", MI
44, 25, 11, "N", 72, 1, 11, "W", "Saint Johnsbury", VT
45, 34, 11, "N", 94, 10, 11, "W", "Saint Cloud", MN
29, 53, 23, "N", 81, 19, 11, "W", "Saint Augustine", FL
43, 25, 48, "N", 83, 56, 24, "W", "Saginaw", MI
38, 35, 24, "N", 121, 29, 23, "W", "Sacramento", CA
43, 36, 36, "N", 72, 58, 12, "W", "Rutland", VT
33, 24, 0, "N", 104, 31, 47, "W", "Roswell", NM
35, 56, 23, "N", 77, 48, 0, "W", "Rocky Mount", NC
41, 35, 24, "N", 109, 13, 48, "W", "Rock Springs", WY
42, 16, 12, "N", 89, 5, 59, "W", "Rockford", IL
43, 9, 35, "N", 77, 36, 36, "W", "Rochester", NY
44, 1, 12, "N", 92, 27, 35, "W", "Rochester", MN
37, 16, 12, "N", 79, 56, 24, "W", "Roanoke", VA
37, 32, 24, "N", 77, 26, 59, "W", "Richmond", VA
39, 49, 48, "N", 84, 53, 23, "W", "Richmond", IN
38, 46, 12, "N", 112, 5, 23, "W", "Richfield", UT
45, 38, 23, "N", 89, 25, 11, "W", "Rhinelander", WI
39, 31, 12, "N", 119, 48, 35, "W", "Reno", NV
50, 25, 11, "N", 104, 39, 0, "W", "Regina", SA
40, 10, 48, "N", 122, 14, 23, "W", "Red Bluff", CA
40, 19, 48, "N", 75, 55, 48, "W", "Reading", PA
41, 9, 35, "N", 81, 14, 23, "W", "Ravenna", OH
Copy/paste in text editor.在文本编辑器中复制/粘贴。 Then save as
cities.csv
.然后另存为
cities.csv
。
Additional CSV files to test you might download from here: https://www.stats.govt.nz/large-datasets/csv-files-for-download/ .您可以从此处下载其他 CSV 文件进行测试: https://www.stats.govt.nz/large-datasets/csv-files-for-download/ 。
Another problem is using Cell.setCellValue
always using String values while Excel differs between string and numeric cell values.另一个问题是使用
Cell.setCellValue
总是使用字符串值,而 Excel 在字符串和数字单元格值之间存在差异。 But this is a well known problem using CSV. One would need an additional definition file which shows which CSV column is of which data type.但这是使用 CSV 的一个众所周知的问题。需要一个额外的定义文件来显示哪个 CSV 列是哪种数据类型。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.