简体   繁体   中英

Scanning for files that contain variable name

I have a simple piece of code that currently uses tesseract OCR to read the text in any given image and then count how many lines it produces. However, I would like to search a directory for any document containing a string (such as M000123456) and return a number of how many documents contain that in their name and compare that to the number tesseract output. The documents are named liked so: M000123456_V987654_05-07-2000.pdf. What's the best way to do this?

import java.io.File;

import net.sourceforge.tess4j.Tesseract;
import net.sourceforge.tess4j.TesseractException;

public class Main {
    public static void main(String[] args) throws TesseractException {
        Tesseract tesseract = new Tesseract();


        // the path of your tess data folder
        // inside the extracted file
        String text
                = tesseract.doOCR(new File("C:\\Users\\mmx0409\\Downloads\\testimage.png"));

        // path of your image file
        System.out.println(text.lines().count()); // count the number of lines tesseract saw


You can use the below function to count the number of the document which is having searchString in its name.

public int countDocuments(String directoryPath, String searchString) {
    File folder = new File(directoryPath);

    File[] listOfFiles = folder.listFiles();

    int count = 0;

    for (int i = 0; i < listOfFiles.length; i++) {
        if (listOfFiles[i].isFile()) {
            String fileName = listOfFiles[i].getName();
            if (fileName.contains(searchString)) {

    return count;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM