简体   繁体   English

优化Java扫描程序以更快地匹配文件中的正则表达式

[英]Optimising Java Scanner to match regex in file faster

I am currently using this code to match a regex against a lot of files, however, this is fairly slow. 我目前正在使用此代码将正则表达式与许多文件进行匹配,但是,这相当慢。 Is there a way I can do the same thing, but faster? 有什么办法可以做同样的事情,但是更快吗?

public class Filter {
    private String title;
    private String regex;
    private List<String> results = new LinkedList<String>();
    ...
}

I have a few Filters for different types of regexes, they range from matching emails to matching words like apikey, ... The code will be used to scan for vulnerabilities in decompiled classes and other text based files. 我有一些针对不同类型正则表达式的过滤器,它们的范围从匹配的电子邮件到匹配的单词(如apikey),...该代码将用于扫描反编译类和其他基于文本的文件中的漏洞。

My code also only checks for 1 match in a file, I'd like to get all matches. 我的代码也只检查文件中是否有1个匹配项,我想获取所有匹配项。

public void startScans() {
    List<File> files = getAllFiles(getFolder()); //Gets a list of all text based files in a folder
    for (int i = 0; i < files.size(); i++) {
        for(Filter filter : getFilters()) {
            try {
                System.out.print("\rScanning file " + i + " out of " + files.size() + " using filter " + filter.getTitle() + "...");
                scanFile(files.get(i), filter);
            } catch (FileNotFoundException ignored) {}
        }
    }
}

private void scanFile(File f, Filter filter) throws FileNotFoundException {
    Scanner scanner = new Scanner(f);
    String result = scanner.findWithinHorizon(filter.getRegex(), 0);
    if (result != null) {
        filter.addResult(result);
    }
    scanner.close();
}

You can also use an external tool if you want a faster execution, that is, execute a command. 如果想要更快的执行速度(即执行命令),也可以使用外部工具。 eg 例如

  • Windows: findstr /R [az]*xyz * Windows: findstr /R [az]*xyz *

  • Linux: egrep -R "[az]*xyz" . Linux: egrep -R "[az]*xyz" .

NOTE : You can run these commands from Java. 注意 :您可以从Java运行这些命令。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM