简体   繁体   中英

Parse groups from regex string?

If I have the following regex strings:

String one = "\"/^[^.]+$|\\.(?!(avi|bmp)$)([^.]+$)/i\"";
String two = "\"/^.*\\.(txt)$/i"";

Assuming I just want to parse the file extensions out of the strings, for example, I'd like:

List<String> fileExtensionsOne = getFileExtensionsFromRegex(one); // Returns ("avi",bmp")
List<String> fileExtensionsTwo = getFileExtensionsFromRegex(two); // Returns ("txt")

What's the best way to implement getFileExtensionsFromRegex ? Is it possible to convert the string to Java Regex objects and grab the groups out of them? eg without applying the pattern to some input text?

Edit, I think I can rely on the regex patterns staying fairly consistent, either this:

'/^.*\\.(' + _map(extensions, 'text').join('|') + ')$/i'

or this:

'/^[^.]+$|\\.(?!(' + _map(extensions, 'text').join('|') + ')$)([^.]+$)/i'

My approach is mainly to create a regex that analyzes the regex, something like

.*\(([a-z0-9\|]+)\).*

(Disclaimer: haven't checked it for correct regex syntax)

This looks for a group inside the regex, beginning with an opening paren \\( , then containing any number of letters, digits and pipes [a-z0-9\\|]+ (assuming that file extensions allow for exactly these characters), followed by the closing paren \\) and returns the content between the parens as group(1) . The group-returning is what the extra parens just inside the \\( and \\) pair are meant for.

In the first example, this should give avi|bmp , and in the second one txt .

Then, do a split("\\|") on the group(1) result, and you get the individual extensions.

This might be what you need:

public static List<String> getFileExtensionsFromRegex(String s) {
    Pattern pattern = Pattern.compile(("[a-zA-Z0-9]{2,}"));
    Matcher matcher = pattern.matcher(s);
    List<String> result = new ArrayList<>();
    while (matcher.find()) {
        result.add(matcher.group());
    }
    return result;
}

Your logic could start with comparing each caracter is it letter by ASCII. Here is a quick ASCII characters reference:

{
"31": "",      "32": " ",     "33": "!",     "34": "\"",    "35": 
"#",    
"36": "$",     "37": "%",     "38": "&",     "39": "'",     "40": 
"(",    
"41": ")",     "42": "*",     "43": "+",     "44": ",",     "45": 
 "-",    
"46": ".",     "47": "/",     "48": "0",     "49": "1",     "50": 
"2",    
"51": "3",     "52": "4",     "53": "5",     "54": "6",     "55": 
"7",    
"56": "8",     "57": "9",     "58": ":",     "59": ";",     "60": 
"<",    
"61": "=",     "62": ">",     "63": "?",     "64": "@",     "65": 
"A",    
"66": "B",     "67": "C",     "68": "D",     "69": "E",     "70": 
"F",    
"71": "G",     "72": "H",     "73": "I",     "74": "J",     "75": 

"K",
"76": "L", "77": "M", "78": "N", "79": "O", "80": "P",
"81": "Q", "82": "R", "83": "S", "84": "T", "85": "U",
"86": "V", "87": "W", "88": "X", "89": "Y", "90": "Z",
"91": "[", "92": "\\", "93": "]", "94": "^", "95": "_",
"96": "`", "97": "a", "98": "b", "99": "c",
"100": "d",
"101": "e", "102": "f", "103": "g", "104": "h",
"105": "i",
"106": "j", "107": "k", "108": "l", "109": "m",
"110": "n",
"111": "o", "112": "p", "113": "q", "114": "r",
"115": "s",
"116": "t", "117": "u", "118": "v", "119": "w",
"120": "x",
"121": "y", "122": "z", "123": "{", "124": "|",
"125": "}",
"126": "~", "127": "" }

something like this from

String.fromCharCode(97) //will return 'a'

to

String.fromCharCode(122) //will return 'z'

if it starts with letter you compare next until it is not letter

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM