简体   繁体   中英

How to get the usedrange of sheet without VBA?

I use groovy script to works with excel file. And I use POI API to manipulate those files. But in the documentation there is no methods or object that help me to find a way to get the used range of a sheet. I tried to calculate it by my own using methods like getLastRowNum() or getPhysicalNumberOfRows() but none of them works well because they stop counting when they meet an empty rows. Sometimes excel file can have empty rows and after those empty rows they could be filled rows, but those methods just STOP when they just meet one empty rows. So those function will not help me to reach my goal.

So I try another solution. I want to create a named range in the workbook by using the methods createName() then make a named range with a formula that return the usedrange of the actual sheet. But I don't know how to make it, I searched a lot and all I found is about VBA, I don't want to use it because in named range formula we can't use VBA. I found a function call GET.WORKBOOK and I think this could a good start point to search an answer about my problem. This function return the list of worksheet name of the workbook. There no link between my problem and this result but I think that GET object could contain more method like GET.WORKSHEET it's very speculative but I think there is more than just GET.WORKBOOK . ( If you have any informations about this, even if it's not solve my problem please put this in the comment please, I'm really interested in this GET function. )

NB: If you find a way to solve my problem with a groovy-only solution I would be very happy too. I didn't recall this type of solution because I search a lot in this direction but I didn't found anything to help me.

NB2: I adding java tag because groovy and java are very close. And I think someone that can found a solution in java for this problem could do the same in groovy.

NB3: I want a cell reference like A1:B2 to specify the used range

NB4: I re-test the methods getLastRowNum() and it worked perfectly, I made some mistakes in my code that's why it didn't work well. Now here's my new problem, when I use this method I cannot access to a cell that empty with the getCell methods. Here's my code:

import org.apache.poi.ss.usermodel.WorkbookFactory; 

wb = WorkbookFactory.create(new File("./webapps/etlserver/data/files/test_ws.xlsx"));

def getUsedRangeByIndex(file_path,ind_ws){
    wb = WorkbookFactory.create(new File(file_path));
    max_col = 0;
    for(int i = 0 ; i < wb.getSheetAt(ind_ws).getLastRowNum() ; i++){
        LOG.info(i.toString())
        if(wb.getSheetAt(ind_ws).getRow(i) != null && wb.getSheetAt(ind_ws).getRow(i).getLastCellNum() > max_col){
              max_col = wb.getSheetAt(ind_ws).getRow(i).getLastCellNum();
        }
    }
  
    return "A1:" + wb.getSheetAt(ind_ws).getRow(wb.getSheetAt(ind_ws).getLastRowNum()).getCell(max_col, RETURN_NULL_AND_BLANK).getReference()
}

LOG.info(getUsedRangeByIndex("./webapps/etlserver/data/files/test_ws.xlsx",0))

I know I have to improve it with some code that calculate the first used cell but for now I will consider A1 as the first used cell.

If the definition of the used range of a worksheet is as follows: ...

The used range is the cell range from first used top left cell to last used bottom right cell.

... and the used Apache POI version is one of the current ones (me using apache poi 5.2.2 ), then the simplest approach to get that used range is usig following methods:

Sheet.getFirstRowNum and Sheet.getLastRowNum to get first used row and last used row in sheet. If one of this return -1 then the sheet does not contain any rows and so does not have a used range.

Then loop over all rows between first used row and last used row and get Row.getFirstCellNum and Row.getLastCellNum . Note the API doc of Row.getLastCellNum : Gets the index of the last cell contained in this row PLUS ONE . If the found first column in that row is lower than before found first columns, then this is the new first column. If the found last column in that row is greater than before found last columns, then this is the new last column.

After that we have first used row, last used row, leftmost used colum and rightmost used column. That is the used range then.

Complete example:

import java.io.FileInputStream;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.ss.util.CellRangeAddress;

class ExcelGetSheetUsedRange {

 /**
  * Simplest method to get the used range from a sheet.
  *
  * @param sheet The sheet to get the used range from.
  + @return CellRangeAddress representing the used range or null for an empty sheet.
  */    
 static CellRangeAddress getUsedRange(Sheet sheet) {
  int firstRow = sheet.getFirstRowNum();
  if (firstRow == -1) return null;
  int lastRow = sheet.getLastRowNum();
  if (lastRow == -1) return null;
  int firstCol = Integer.MAX_VALUE;
  int lastCol = -1;
  for (int r = firstRow; r <= lastRow; r++) {
   Row row = sheet.getRow(r);
   if (row != null) {
    int thisRowFirstCol = row.getFirstCellNum();
    int thisRowLastCol = row.getLastCellNum()-1; // see API doc Row.getLastCellNum : Gets the index of the last cell contained in this row PLUS ONE.
    if (thisRowFirstCol < firstCol) firstCol = thisRowFirstCol;
    if (thisRowLastCol > lastCol) lastCol = thisRowLastCol;
   }   
  }
  if (firstCol == Integer.MAX_VALUE) return null;
  if (lastCol == -1) return null;
  return new CellRangeAddress(firstRow, lastRow, firstCol, lastCol);
 }  

 public static void main(String[] args) throws Exception {
  //Workbook workbook = WorkbookFactory.create(new FileInputStream("./template.xls"));  
  Workbook workbook = WorkbookFactory.create(new FileInputStream("./template.xlsx"));   
  Sheet sheet = workbook.getSheetAt(0);
  CellRangeAddress usedRange = getUsedRange(sheet);
  System.out.println(usedRange);  
 }
}

As told in API doc of Sheet.getLastRowNum :

Note: rows which had content before and were set to empty later might still be counted as rows by Excel and Apache POI...

But that is a problem of Excel wich also may occur when get the used range via Worksheet.UsedRange property .

The solution of Axel Richter is perfect. But here's a pre-build code you can directly insert into a jedox job to makes things work well. It a kind of translation from java to groovy. Here's the code:

import org.apache.poi.ss.usermodel.WorkbookFactory;
import org.apache.poi.ss.util.CellRangeAddress;

wb = WorkbookFactory.create(new File("./webapps/etlserver/data/files/test_ws.xlsx")); 
sheet = wb.getSheetAt(0);

def getUsedRange(sheet) {
    firstRow = sheet.getFirstRowNum();
    if (firstRow == -1) return null;
    lastRow = sheet.getLastRowNum();
    if (lastRow == -1) return null;
    firstCol = Integer.MAX_VALUE;
    lastCol = -1;
    for (int r = firstRow; r <= lastRow; r++) {
     row = sheet.getRow(r);
     if (row != null) {
        thisRowFirstCol = row.getFirstCellNum();
        thisRowLastCol = row.getLastCellNum()-1;
        if (thisRowFirstCol < firstCol) firstCol = thisRowFirstCol;
        if (thisRowLastCol > lastCol) lastCol = thisRowLastCol;
     }   
    }
    if (firstCol == Integer.MAX_VALUE) return null;
    if (lastCol == -1) return null;
    
    return (new CellRangeAddress(firstRow, lastRow, firstCol, lastCol)).formatAsString();
}   

LOG.info(getUsedRange(sheet));

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM