[英]Java - Filtering rows of an Excel file using POI
我有一個包含多行(超過 60,000 行)的 Excel 文件,我想對它們應用過濾器,以便僅讀取我要查找的行。
我在 Java 中使用 POI 庫,但我沒有找到如何過濾值。
例如,在我的 Excel 文件中有以下數據:
First name | Last name | Age
-----------+-----------+----
Jhon | Doe | 25
Foo | Bar | 20
Aaa | Doe | 22
如何選擇姓氏等於Doe
每一行?
到目前為止,這是我的代碼:
public void parseExcelFile(XSSFWorkbook myExcelFile) {
XSSFSheet worksheet = myExcelFile.getSheetAt(1);
// Cell range to filter
CellRangeAddress data = new CellRangeAddress(
1,
worksheet.getLastRowNum(),
0,
worksheet.getRow(0).getPhysicalNumberOfCells());
worksheet.setAutoFilter(data);
}
我嘗試使用自動AutoFilter
但我不知道它是如何工作的。
我正在尋找一個看起來像這樣的功能:
Filter filter = new Filter();
filter.setRange(myRange);
filter.addFilter(
0, // The column index
"Doe" // The value that I'm searching for
)
filter.apply()
這純粹是假設的代碼。
感謝您的幫助 !
如果您的問題是如何為姓氏設置自動AutoFilter
條件“Doe”,那么這只能使用底層的低級ooxml-schemas
類來實現。 XSSFAutoFilter
都沒用。 直到現在它不提供任何方法。
使用您的示例數據的完整示例:
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.ss.util.*;
import org.apache.poi.xssf.usermodel.*;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTAutoFilter;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTFilterColumn;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTFilters;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTCustomFilters;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTCustomFilter;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.STFilterOperator;
import java.io.FileOutputStream;
class AutoFilterSetTest {
private static void setCellData(Sheet sheet) {
Object[][] data = new Object[][] {
new Object[] {"First name", "Last name", "Age"},
new Object[] {"John", "Doe", 25},
new Object[] {"Foo", "Bar", 20},
new Object[] {"Jane", "Doe", 22},
new Object[] {"Ruth", "Moss", 42},
new Object[] {"Manuel", "Doe", 32},
new Object[] {"Axel", "Richter", 56},
};
Row row = null;
Cell cell = null;
int r = 0;
int c = 0;
for (Object[] dataRow : data) {
row = sheet.createRow(r);
c = 0;
for (Object dataValue : dataRow) {
cell = row.createCell(c);
if (dataValue instanceof String) {
cell.setCellValue((String)dataValue);
} else if (dataValue instanceof Number) {
cell.setCellValue(((Number)dataValue).doubleValue());
}
c++;
}
r++;
}
}
private static void setCriteriaFilter(XSSFSheet sheet, int colId, int firstRow, int lastRow, String[] criteria) throws Exception {
CTAutoFilter ctAutoFilter = sheet.getCTWorksheet().getAutoFilter();
CTFilterColumn ctFilterColumn = null;
for (CTFilterColumn filterColumn : ctAutoFilter.getFilterColumnList()) {
if (filterColumn.getColId() == colId) ctFilterColumn = filterColumn;
}
if (ctFilterColumn == null) ctFilterColumn = ctAutoFilter.addNewFilterColumn();
ctFilterColumn.setColId(colId);
if (ctFilterColumn.isSetFilters()) ctFilterColumn.unsetFilters();
CTFilters ctFilters = ctFilterColumn.addNewFilters();
for (int i = 0; i < criteria.length; i++) {
ctFilters.addNewFilter().setVal(criteria[i]);
}
//hiding the rows not matching the criterias
DataFormatter dataformatter = new DataFormatter();
for (int r = firstRow; r <= lastRow; r++) {
XSSFRow row = sheet.getRow(r);
boolean hidden = true;
for (int i = 0; i < criteria.length; i++) {
String cellValue = dataformatter.formatCellValue(row.getCell(colId));
if (criteria[i].equals(cellValue)) hidden = false;
}
if (hidden) {
row.getCTRow().setHidden(hidden);
} else {
if (row.getCTRow().getHidden()) row.getCTRow().unsetHidden();
}
}
}
public static void main(String[] args) throws Exception {
XSSFWorkbook wb = new XSSFWorkbook();
XSSFSheet sheet = wb.createSheet();
//create rows of data
setCellData(sheet);
for (int c = 0; c < 2; c++) sheet.autoSizeColumn(c);
int lastRow = sheet.getLastRowNum();
XSSFAutoFilter autofilter = sheet.setAutoFilter(new CellRangeAddress(0, lastRow, 0, 2));
//XSSFAutoFilter is useless until now
//set filter criteria
setCriteriaFilter(sheet, 1, 1, lastRow, new String[]{"Doe"});
//get only visible rows after filtering
XSSFRow row = null;
for (int r = 1; r <= lastRow; r++) {
row = sheet.getRow(r);
if (row.getCTRow().getHidden()) continue;
for (int c = 0; c < 3; c++) {
System.out.print(row.getCell(c) + "\t");
}
System.out.println();
}
FileOutputStream out = new FileOutputStream("AutoFilterSetTest.xlsx");
wb.write(out);
out.close();
wb.close();
}
}
它打印:
John Doe 25.0
Jane Doe 22.0
Manuel Doe 32.0
結果AutoFilterSetTest.xlsx
看起來像:
也許這可以幫助其他人,所以這是我在此答案之前提出的解決方案。
考慮到我對Java不是很好,所以下面的代碼肯定可以優化。
我自己實現了一個過濾器,為此,我創建了 3 個類:
ExcelWorksheetFilter
FilterRule
FilterRuleOperation
ExcelWorksheetFilter
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.util.CellRangeAddress;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import java.util.ArrayList;
import java.util.List;
public class ExcelWorksheetFilter {
private List<FilterRule> ruleList = new ArrayList<>();
private CellRangeAddress cellRange;
private XSSFSheet worksheet;
private XSSFWorkbook workbook;
public ExcelWorksheetFilter(XSSFWorkbook workbook, int worksheetId) {
this.workbook = workbook;
this.worksheet = workbook.getSheetAt(worksheetId);
}
/**
* Apply rules of ruleList to the worksheet.
* The row is put in the result if at least one rule match.
*/
public void apply(){
for(int rowId = cellRange.getFirstRow(); rowId <= cellRange.getLastRow(); rowId++){
worksheet.getRow(rowId).getCTRow().setHidden(true);
for(FilterRule rule : ruleList){
if(rule.match(worksheet.getRow(rowId))){
worksheet.getRow(rowId).getCTRow().setHidden(false);
break;
}
}
}
}
/**
* Apply rules of ruleList to the worksheet.
* The row is put in the result if every rules match.
*/
public void applyStrict(){
for(int rowId = cellRange.getFirstRow(); rowId <= cellRange.getLastRow(); rowId++){
worksheet.getRow(rowId).getCTRow().setHidden(false);
for(FilterRule rule : ruleList){
if(!rule.match(worksheet.getRow(rowId))){
worksheet.getRow(rowId).getCTRow().setHidden(true);
break;
}
}
}
}
public List<Row> getRowList(){
List<Row> rowList = new ArrayList<>();
for(int rowId = cellRange.getFirstRow(); rowId <= cellRange.getLastRow(); rowId++){
if(!worksheet.getRow(rowId).getCTRow().getHidden()){
rowList.add(worksheet.getRow(rowId));
}
}
return rowList;
}
public void addRule(FilterRule rule) {
this.ruleList.add(rule);
}
// Getters and setters omitted...
}
FilterRule
import org.apache.poi.ss.usermodel.DataFormatter;
import org.apache.poi.xssf.usermodel.XSSFRow;
public class FilterRule {
private final static DataFormatter df = new DataFormatter();
private Integer columnId;
private String[] values;
private FilterRuleOperation operator;
public FilterRule(Integer columnId, FilterRuleOperation operator, String[] values){
this.columnId = columnId;
this.operator = operator;
this.values = values;
}
/**
* If at least one of the value matches return true.
* @param row The row to match
* @return a boolean
*/
public boolean match(XSSFRow row){
for(String value : values){
if(operator.match(df.formatCellValue(row.getCell(columnId)), value)){
return true;
};
}
return false;
}
}
FilterRuleOperation
public enum FilterRuleOperation {
DIFFERENT("!="){
@Override public boolean match(String x, String y){
return !x.equals(y);
}
},
EQUAL("=="){
@Override public boolean match(String x, String y){
return x.equals(y);
}
};
private final String text;
private FilterRuleOperation(String text) {
this.text = text;
}
public abstract boolean match(String x, String y);
@Override public String toString() {
return text;
}
}
然后您幾乎可以像 OP 中描述的那樣使用它。
例如使用這個 Excel 文件:
而這段代碼:
public void parseExcelFile(XSSFWorkbook myExcelFile) {
XSSFSheet worksheet = myExcelFile.getSheetAt(1);
// Create the filter
ExcelWorksheetFilter excelWorksheetFilter = new ExcelWorksheetFilter(myExcelFile, 0);
excelWorksheetFilter.setCellRange(new CellRangeAddress(
1, // Exclude the row with columns titles
worksheet.getLastRowNum(),
0,
worksheet.getRow(0).getPhysicalNumberOfCells()-1
));
// Create rules for filtering
excelWorksheetFilter.addRule(new FilterRule(
1, // Last name column
FilterRuleOperation.EQUAL,
new String[]{"Doe"}
));
excelWorksheetFilter.addRule(new FilterRule(
0, // First name column
FilterRuleOperation.EQUAL,
new String[]{"Jhon"}
));
// Apply with applyStrict function puts a AND condition between rules
excelWorksheetFilter.applyStrict();
// You can also use apply function it puts a OR condition between rules
// excelWorksheetFilter.apply();
excelWorksheetFilter.getRowList().forEach(row -> {
for(int i = 0; i <3; i++) {
System.out.print(df.formatCellValue(row.getCell(i)) + '\t');
}
System.out.println();
});
// Save the file
FileOutputStream out = new FileOutputStream("filter_test.xlsx");
excelWorksheetFilter.getWorkbook().write(out);
out.close();
excelWorksheetFilter.getWorkbook().close();
}
這將打印:
Jhon Doe 25
如果您使用excelWorksheetFilter.apply()
它將打印:
Jhon Doe 25
Aaa Doe 22
Jhon Smith 30
兩個主要缺點是:
ExcelWorksheetFilter.getRowList()
函數返回一個列表而不是一個迭代器。此外,它僅適用於字符串,但我認為它可以適用於其他類型的數據。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.