簡體   English   中英

Java - 使用 POI 過濾 Excel 文件的行

[英]Java - Filtering rows of an Excel file using POI

我有一個包含多行(超過 60,000 行)的 Excel 文件,我想對它們應用過濾器,以便僅讀取我要查找的行。

我在 Java 中使用 POI 庫,但我沒有找到如何過濾值。

例如,在我的 Excel 文件中有以下數據:

First name | Last name | Age
-----------+-----------+----
Jhon       | Doe       |  25
Foo        | Bar       |  20
Aaa        | Doe       |  22

如何選擇姓氏等於Doe每一行?

到目前為止,這是我的代碼:

public void parseExcelFile(XSSFWorkbook myExcelFile) {
    XSSFSheet worksheet = myExcelFile.getSheetAt(1);

    // Cell range to filter
    CellRangeAddress data = new CellRangeAddress(
            1,
            worksheet.getLastRowNum(),
            0,
            worksheet.getRow(0).getPhysicalNumberOfCells());

    worksheet.setAutoFilter(data);
}

我嘗試使用自動AutoFilter但我不知道它是如何工作的。

我正在尋找一個看起來像這樣的功能:

Filter filter = new Filter();
filter.setRange(myRange);
filter.addFilter(
    0, // The column index
    "Doe" // The value that I'm searching for
)
filter.apply()

這純粹是假設的代碼。

感謝您的幫助 !

如果您的問題是如何為姓氏設置自動AutoFilter條件“Doe”,那么這只能使用底層的低級ooxml-schemas類來實現。 XSSFAutoFilter都沒用。 直到現在它不提供任何方法。

使用您的示例數據的完整示例:

import org.apache.poi.ss.usermodel.*;
import org.apache.poi.ss.util.*;
import org.apache.poi.xssf.usermodel.*;

import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTAutoFilter;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTFilterColumn;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTFilters;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTCustomFilters;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTCustomFilter;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.STFilterOperator;

import java.io.FileOutputStream;

class AutoFilterSetTest {

 private static void setCellData(Sheet sheet) {

  Object[][] data = new Object[][] {
   new Object[] {"First name", "Last name", "Age"},
   new Object[] {"John", "Doe", 25},
   new Object[] {"Foo", "Bar", 20},
   new Object[] {"Jane", "Doe", 22},
   new Object[] {"Ruth", "Moss", 42},
   new Object[] {"Manuel", "Doe", 32},
   new Object[] {"Axel", "Richter", 56},
  };

  Row row = null;
  Cell cell = null;
  int r = 0;
  int c = 0;
  for (Object[] dataRow : data) {
   row = sheet.createRow(r);
   c = 0;
   for (Object dataValue : dataRow) {
    cell = row.createCell(c);
    if (dataValue instanceof String) {
     cell.setCellValue((String)dataValue);
    } else if (dataValue instanceof Number) {
     cell.setCellValue(((Number)dataValue).doubleValue());
    }
    c++;
   }
   r++;
  }
 }

 private static void setCriteriaFilter(XSSFSheet sheet, int colId, int firstRow, int lastRow, String[] criteria) throws Exception {
  CTAutoFilter ctAutoFilter = sheet.getCTWorksheet().getAutoFilter();
  CTFilterColumn ctFilterColumn = null;
  for (CTFilterColumn filterColumn : ctAutoFilter.getFilterColumnList()) {
   if (filterColumn.getColId() == colId) ctFilterColumn = filterColumn;
  }
  if (ctFilterColumn == null) ctFilterColumn = ctAutoFilter.addNewFilterColumn();
  ctFilterColumn.setColId(colId);
  if (ctFilterColumn.isSetFilters()) ctFilterColumn.unsetFilters();

  CTFilters ctFilters = ctFilterColumn.addNewFilters();
  for (int i = 0; i < criteria.length; i++) {
   ctFilters.addNewFilter().setVal(criteria[i]);
  }

  //hiding the rows not matching the criterias
  DataFormatter dataformatter = new DataFormatter();
  for (int r = firstRow; r <= lastRow; r++) {
   XSSFRow row = sheet.getRow(r);
   boolean hidden = true;
   for (int i = 0; i < criteria.length; i++) {
    String cellValue = dataformatter.formatCellValue(row.getCell(colId));
    if (criteria[i].equals(cellValue)) hidden = false;
   }
   if (hidden) {
    row.getCTRow().setHidden(hidden);
   } else {
    if (row.getCTRow().getHidden()) row.getCTRow().unsetHidden();
   }
  }
 }

 public static void main(String[] args) throws Exception {

  XSSFWorkbook wb = new XSSFWorkbook();
  XSSFSheet sheet = wb.createSheet();

  //create rows of data
  setCellData(sheet);

  for (int c = 0; c < 2; c++) sheet.autoSizeColumn(c);

  int lastRow = sheet.getLastRowNum();
  XSSFAutoFilter autofilter = sheet.setAutoFilter(new CellRangeAddress(0, lastRow, 0, 2));
  //XSSFAutoFilter is useless until now

  //set filter criteria 
  setCriteriaFilter(sheet, 1, 1, lastRow, new String[]{"Doe"});

  //get only visible rows after filtering
  XSSFRow row = null;
  for (int r = 1; r <= lastRow; r++) {
   row = sheet.getRow(r);
   if (row.getCTRow().getHidden()) continue;
   for (int c = 0; c < 3; c++) {
    System.out.print(row.getCell(c) + "\t");
   }
   System.out.println();
  }

  FileOutputStream out = new FileOutputStream("AutoFilterSetTest.xlsx");
  wb.write(out);
  out.close();
  wb.close();
 }
}

它打印:

John    Doe   25.0  
Jane    Doe   22.0  
Manuel  Doe   32.0  

結果AutoFilterSetTest.xlsx看起來像:

在此處輸入圖片說明

也許這可以幫助其他人,所以這是我在此答案之前提出的解決方案。
考慮到我對Java不是很好,所以下面的代碼肯定可以優化。

我自己實現了一個過濾器,為此,我創建了 3 個類:

  • ExcelWorksheetFilter
  • FilterRule
  • FilterRuleOperation

ExcelWorksheetFilter

import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.util.CellRangeAddress;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

import java.util.ArrayList;
import java.util.List;

public class ExcelWorksheetFilter {

    private List<FilterRule> ruleList = new ArrayList<>();
    private CellRangeAddress cellRange;
    private XSSFSheet worksheet;
    private XSSFWorkbook workbook;

    public ExcelWorksheetFilter(XSSFWorkbook workbook, int worksheetId) {
        this.workbook = workbook;
        this.worksheet = workbook.getSheetAt(worksheetId);
    }

    /**
     * Apply rules of ruleList to the worksheet.
     * The row is put in the result if at least one rule match.
     */
    public void apply(){

        for(int rowId = cellRange.getFirstRow(); rowId <= cellRange.getLastRow(); rowId++){
            worksheet.getRow(rowId).getCTRow().setHidden(true);
            for(FilterRule rule : ruleList){
                if(rule.match(worksheet.getRow(rowId))){
                    worksheet.getRow(rowId).getCTRow().setHidden(false);
                    break;
                }
            }
        }
    }

    /**
     * Apply rules of ruleList to the worksheet.
     * The row is put in the result if every rules match.
     */
    public void applyStrict(){
        for(int rowId = cellRange.getFirstRow(); rowId <= cellRange.getLastRow(); rowId++){
            worksheet.getRow(rowId).getCTRow().setHidden(false);
            for(FilterRule rule : ruleList){
                if(!rule.match(worksheet.getRow(rowId))){
                    worksheet.getRow(rowId).getCTRow().setHidden(true);
                    break;
                }
            }
        }
    }

    public List<Row> getRowList(){
        List<Row> rowList = new ArrayList<>();

        for(int rowId = cellRange.getFirstRow(); rowId <= cellRange.getLastRow(); rowId++){
            if(!worksheet.getRow(rowId).getCTRow().getHidden()){
                rowList.add(worksheet.getRow(rowId));
            }
        }

        return rowList;
    }

    public void addRule(FilterRule rule) {
        this.ruleList.add(rule);
    }

    // Getters and setters omitted...
}

FilterRule

import org.apache.poi.ss.usermodel.DataFormatter;
import org.apache.poi.xssf.usermodel.XSSFRow;

public class FilterRule {

    private final static DataFormatter df = new DataFormatter();

    private Integer columnId;
    private String[] values;
    private FilterRuleOperation operator;

    public FilterRule(Integer columnId, FilterRuleOperation operator, String[] values){
        this.columnId = columnId;
        this.operator = operator;
        this.values = values;
    }

    /**
     * If at least one of the value matches return true.
     * @param row The row to match
     * @return a boolean
     */
    public boolean match(XSSFRow row){
        for(String value : values){
            if(operator.match(df.formatCellValue(row.getCell(columnId)), value)){
                return true;
            };
        }
        return false;
    }
}

FilterRuleOperation

public enum FilterRuleOperation {

    DIFFERENT("!="){
        @Override public boolean match(String x, String y){
            return !x.equals(y);
        }
    },
    EQUAL("=="){
        @Override public boolean match(String x, String y){
            return x.equals(y);
        }
    };

    private final String text;

    private FilterRuleOperation(String text) {
        this.text = text;
    }

    public abstract boolean match(String x, String y);

    @Override public String toString() {
        return text;
    }
}

然后您幾乎可以像 OP 中描述的那樣使用它。
例如使用這個 Excel 文件:
Excel數據截圖

而這段代碼:

public void parseExcelFile(XSSFWorkbook myExcelFile) {
    XSSFSheet worksheet = myExcelFile.getSheetAt(1);

    // Create the filter
    ExcelWorksheetFilter excelWorksheetFilter = new ExcelWorksheetFilter(myExcelFile, 0);
    excelWorksheetFilter.setCellRange(new CellRangeAddress(
        1, // Exclude the row with columns titles
        worksheet.getLastRowNum(),
        0,
        worksheet.getRow(0).getPhysicalNumberOfCells()-1
    ));

    // Create rules for filtering
    excelWorksheetFilter.addRule(new FilterRule(
            1, // Last name column
            FilterRuleOperation.EQUAL,
            new String[]{"Doe"}
            ));

    excelWorksheetFilter.addRule(new FilterRule(
            0, // First name column
            FilterRuleOperation.EQUAL,
            new String[]{"Jhon"}
    ));

    // Apply with applyStrict function puts a AND condition between rules
    excelWorksheetFilter.applyStrict();
    // You can also use apply function it puts a OR condition between rules
    // excelWorksheetFilter.apply();
    
    excelWorksheetFilter.getRowList().forEach(row -> {
        for(int i = 0; i <3; i++) {
            System.out.print(df.formatCellValue(row.getCell(i)) + '\t');
        }
        System.out.println();
    });

    // Save the file
    FileOutputStream out = new FileOutputStream("filter_test.xlsx");
    excelWorksheetFilter.getWorkbook().write(out);
    out.close();
    excelWorksheetFilter.getWorkbook().close();
}

這將打印:

Jhon    Doe 25

在此處輸入圖片說明

如果您使用excelWorksheetFilter.apply()它將打印:

Jhon    Doe    25   
Aaa     Doe    22   
Jhon    Smith  30

在此處輸入圖片說明

兩個主要缺點是:

  • 它不使用 Excel 過濾器,因此以后更難使用 Excel 文件。
  • 內存效率不高,因為ExcelWorksheetFilter.getRowList()函數返回一個列表而不是一個迭代器。

此外,它僅適用於字符串,但我認為它可以適用於其他類型的數據。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM