简体   繁体   中英

In Java, how can I read data from multiple files using posix wildcard syntax?

Currently, I have a script that loops over System.in for data processing. I am passing data to it from several files with cat .

cat myfiles*.txt | java MyDataProcessor 

Based on the idea that cat adds some inefficiency vs. Java opening the files directly, I'd like to optimize this to where Java opens the files directly:

java MyDataProcessor myfiles*.txt

Are there any Java libraries that make this fairly easy (ie that handle the translation of posix wildcards into file handlers)?

Java 7 added a PathMatcher class that can be used to validate a path name based on a glob (which will be similar to the matching done by your shell)

PathMatcher matcher = FileSystems.getDefault().getPathMatcher("glob:myfiles*.txt");
matcher.matches(filename);

An example of walking a file tree and searching for files based on globs can be found in the Oracle Java tutorials here

最好传递目录名称并通过目录树进行Java解析,而不要依赖于特定于shell的“通配符”。

I would use java.io.File to iterate over the entire directory, and then filter the filenames using regular expressions. You can convert a wildcard expression to a regular expression using this code:

    /**
 * Converts wildcard expression to regular expression. In wildcard-format,
 * '*' = 0-N characters and ? = any one character.
 * @param wildcardExp wildcard expression string
 * @param buf buffer which receives the regular expression
 */
static public void wildcardToRegexp(FastStringBuffer wildcardExp, FastStringBuffer buf) {
    final int len = wildcardExp.size();
    buf.clear();
    for (int i = 0; i < len; i++) {
        char c = wildcardExp.charAt(i);
        switch (c) {
        case '*':
            buf.append('.');
            buf.append('*');
            break;
        case '?':
            buf.append('.');
            break;
        // escape special regexp-characters

        case '(':
        case ')':
        case '[':
        case ']':
        case '$':
        case '^':
        case '.':
        case '{':
        case '}':
        case '|':
        case '\\':
        case '+':
            buf.append('\\');
            buf.append(c);
            break;
        default:
            buf.append(c);
            break;
        }
    }
}

Look at Java Grep Library It close to your task but no wildcards.

Apache provide class with wildcards: http://cleanjava.wordpress.com/2012/03/21/wildcard-file-filter-in-java/

In case this isn't obvious to someone, as it wasn't to me at first, if the files are local, then you can let Posix do the parsing for you, and the files will be passed to main(String[] args) as arguments. In my case, I had a few other parameters, so just moved the wildcard argument as the last one.

// USAGE: java MyProcessor arg1 arg2 myfiles*.txt

public static void main(String[] args) throws Exception {
  String arg1 = args[0];
  String arg2 = args[1];

  // looping over all input files
  for (int i = 2; i < args.length; i++) {
    File inputFile = new File(args[i]).getCanonicalFile();
    BufferedReader in = new BufferedReader(new FileReader(inputFile)); 
    // ...
  }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM