简体   繁体   中英

Optimising Java Scanner to match regex in file faster

I am currently using this code to match a regex against a lot of files, however, this is fairly slow. Is there a way I can do the same thing, but faster?

public class Filter {
    private String title;
    private String regex;
    private List<String> results = new LinkedList<String>();
    ...
}

I have a few Filters for different types of regexes, they range from matching emails to matching words like apikey, ... The code will be used to scan for vulnerabilities in decompiled classes and other text based files.

My code also only checks for 1 match in a file, I'd like to get all matches.

public void startScans() {
    List<File> files = getAllFiles(getFolder()); //Gets a list of all text based files in a folder
    for (int i = 0; i < files.size(); i++) {
        for(Filter filter : getFilters()) {
            try {
                System.out.print("\rScanning file " + i + " out of " + files.size() + " using filter " + filter.getTitle() + "...");
                scanFile(files.get(i), filter);
            } catch (FileNotFoundException ignored) {}
        }
    }
}

private void scanFile(File f, Filter filter) throws FileNotFoundException {
    Scanner scanner = new Scanner(f);
    String result = scanner.findWithinHorizon(filter.getRegex(), 0);
    if (result != null) {
        filter.addResult(result);
    }
    scanner.close();
}

You can also use an external tool if you want a faster execution, that is, execute a command. eg

  • Windows: findstr /R [az]*xyz *

  • Linux: egrep -R "[az]*xyz" .

NOTE : You can run these commands from Java.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM