简体   繁体   中英

Java regex String parse, trying to figure out a pattern

File file = new File("file-type-string-i-want-2000-01-01-01-01-01.conf.gz");
            Matcher matcher = pattern.compile("\\-(.*)\\-\\d{4}")).matcher(fileName);
            StringBuilder sb = new StringBuilder();
            while (matcher.find()) {
                sb.append(matcher.group());
            }
            stringList = Arrays.asList(sb.toString().split("-"));
            if (stringList.size() >= 2) {
                nameFragment = stringList.get(stringList.size() - 2);
            }

Desired result is to extract

string-iwant 

from strings that look like this

file-type-string-iwant-2000-01-01-01-01-01.conf.gz 

Unfortunatly, the format for "string-iwant" is a non-fixed length of alpha-numeric characters that will include only ONE hyphen BUT never start with a hyphen. The date formatting is consistent, the year is always after the string, so my current approach is to match on the -year, but I'm having difficulty excluding the stuff at the beginning.

Thanks for any thoughts or ideas

Edit: updated strings

Here's the regex you need:

\\\\-([^-]+\\\\-[^-]+)\\\\-\\\\d{4}\\\\-

Basically it means:

  • - starts with minus
  • ([^-]+\\\\-[^-]+) contains 1 or more non-minus symbols, then a minus, then 1 or more non-minus symbols. This part is captured.
  • -\\d{4} a minus sign and 4 digits

However, that will only work if stuff-you-need has only one hyphen (or a constant amount of hyphens, which will need correction in regex). Otherwise, there is no way to know if given the string file-type-string-i-want the word type belongs to a sting you want or not.

Added:

In case the file-type always contains exactly one hyphen, you can capture the required part this way:

[^-]+\\\\-[^-]+\\\\-(.*)\\\\-\\\\d{4}\\\\-

Explanation:

  • [^-]+\\-[^-]+\\\\- some amount of non-hyphen characters, then a hyphen, then more non-hyphens. This will skip the file-type string with the following hyphen.
  • \\-\\d{4}\\- a hyphen, 4 digits followed by another hyphen
  • (.*) everything in between of previous 2 statements is captured as being the string you need to select

如果是PHP,我会使用类似下面的内容来捕获该字符串。

/^(\w+\-){2}(?<string>.+?)\-\d{4}(\-\d{2}){5}(\.\w+){2}$/

The regex that I would use for this purpose is this with a positive lookahead:

Pattern p = Pattern.compile("[^-]+-[^-]+(?=-\\d{4})");

Which simply means match the text containing exactly one hyphen followed by one hyphen and a 4 digit year .

Then you can simply grab the matcher.group(0) as your matched text which will be string-iwant in this case.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM