使用正則表達式解析文本

Question

所以我試圖解析一個包含兩個關鍵組成部分的字符串。 一個告訴我時間選擇，另一個告訴我位置。

這是文本的樣子

KB_H9Oct4GFP_20130305_p00{iiii}t00000{ttt}z001c02.tif

{iiii}是頭寸， {ttt}是時間選項。

我需要將{ttt}和{iiii}分開，以便獲得完整的文件名：例如，位置1和時間片1 = KB_H9Oct4GFP_20130305_p0000001t000000001z001c02.tif

到目前為止，這里是我解析它們的方式：

    int startTimeSlice = 1;
    int startTile = 1;
    String regexTime = "([^{]*)\\{([t]+)\\}(.*)";
    Pattern patternTime = Pattern.compile(regexTime);       
    Matcher matcherTime = patternTime.matcher(filePattern);

    if (!matcherTime.find() || matcherTime.groupCount() != 3)
    {

        throw new IllegalArgumentException("Incorect filePattern: " + filePattern);
    }

    String timePrefix = matcherTime.group(1);
    int tCount = matcherTime.group(2).length();
    String timeSuffix = matcherTime.group(3);

    String timeMatcher = timePrefix + "%0" + tCount + "d" + timeSuffix;


    String timeFileName = String.format(timeMatcher, startTimeSlice);

    String regex = "([^{]*)\\{([i]+)\\}(.*)";
    Pattern pattern = Pattern.compile(regex);       
    Matcher matcher = pattern.matcher(timeFileName);        



    if (!matcher.find() || matcher.groupCount() != 3)
    {
        throw new IllegalArgumentException("Incorect filePattern: " + filePattern);
    }

    String prefix = matcher.group(1);
    int iCount = matcher.group(2).length();
    String suffix = matcher.group(3);

    String nameMatcher = prefix + "%0" + iCount + "d" + suffix;

    String fileName = String.format(nameMatcher, startTile);

不幸的是，我的代碼無法正常工作，並且在檢查第二個matcher是否在timeFileName找到任何內容時timeFileName 。

在進行第一次正則表達式檢查后，它得到以下內容作為timeFileName ： 000000001z001c02.tif ，因此它將切斷包括{iiii}在內的開頭部分。

不幸的是，我不能假設哪個組先進入（ {iiii}或{ttt} ），所以我試圖設計一個解決方案，該解決方案首先處理{ttt} ，然后處理{iiii} 。

另外，這是我也在嘗試解析的有效文本的另一個示例： F_{iii}_{ttt}.tif

Answer 1

遵循的步驟：

在文件名中找到字符串{ttt ...}
根據字符串中“ t”的編號形成數字格式
在文件名中找到字符串{iiii ...}
根據字符串中“ i”的編號形成數字格式
使用String.replace（）方法替換時間和位置

這是代碼：

String filePattern = "KB_H9Oct4GFP_20130305_p00{iiii}t00000{ttt}z001c02.tif";
int startTimeSlice = 1;
int startTile = 1;

Pattern patternTime = Pattern.compile("(\\{[t]*\\})");
Matcher matcherTime = patternTime.matcher(filePattern);

if (matcherTime.find()) {
    String timePattern = matcherTime.group(0);// {ttt}

    NumberFormat timingFormat = new DecimalFormat(timePattern.replaceAll("t", "0")
            .substring(1, timePattern.length() - 1));// 000

    Pattern patternPosition = Pattern.compile("(\\{[i]*\\})");
    Matcher matcherPosition = patternPosition.matcher(filePattern);

    if (matcherPosition.find()) {
        String positionPattern = matcherPosition.group(0);// {iiii}

        NumberFormat positionFormat = new DecimalFormat(positionPattern
                .replaceAll("i", "0").substring(1, positionPattern.length() - 1));// 0000

        System.out.println(filePattern.replace(timePattern,
                timingFormat.format(startTimeSlice)).replace(positionPattern,
                positionFormat.format(startTile)));
    }
}

Answer 2

您的第一個模式如下所示：

String regexTime = "([^{]*)\\{([t]+)\\}(.*)";

這將找到一個字符串，該字符串由零個或多個非{字符組成，然后由{t...t} ，然后是其他字符組成。

當您輸入

KB_H9Oct4GFP_20130305_p00{iiii}t00000{ttt}z001c02.tif

匹配的第一個子字符串是

iiii}t00000{ttt}z001c02.tif

i之前的{無法匹配，因為您告訴它只能匹配非{字符。 結果是，當您重新iiii}字符串以進行第二次匹配時，它將以iiii}開頭，因此不會像您嘗試的那樣匹配{iiii} 。

當您尋找{ttt...} ，我看不出有任何理由從字符串的第一部分中排除{或其他任何字符。 因此將正則表達式更改為

"^(.*)\\{(t+\\}(.*)$"

可能是解決此問題的簡單方法。 注意，如果要確保在組中包括字符串的整個開頭和字符串的整個結尾，則應包括^和$以分別匹配字符串的開頭和結尾； 否則，匹配器引擎可能會決定不包括所有內容。 在這種情況下，它不會，但是無論如何都是一個好習慣，因為這使事情變得很明確，並且不需要任何人知道“貪婪”和“勉強”匹配之間的區別。 或者使用matches()而不是find() ，因為matches()自動嘗試匹配整個字符串。

Answer 3

好的，因此，經過一些測試，我找到了一種處理這種情況的方法：

為了解析{ttt}我可以使用正則表達式： (.*)\\\\{t([t]+)\\\\}(.*)

現在，這意味着我必須將tCount加1才能說明從\\\\{t

{iii} ： (.*)\\\\{i([i]+)\\\\}(.*)

Answer 4

也許更簡單的方法（如http://regex101.com/r/vG7kY7所確認）是

(\{i+\}).*(\{t+\})

您不需要在要匹配的單個字符周圍使用[] 。 把事情簡單化。 i+表示“一個或多個i ”，只要按給定的順序進行，該表達式即可工作（第一個匹配項為{iiii} ，第二個匹配項為{ttttt} ）。

在字符串中編寫時，可能需要轉義反斜杠...

使用正則表達式解析文本

問題描述

4 個解決方案

解決方案1
1 已采納 2014-03-04 21:15:35

解決方案2
0 2014-03-04 20:51:06

解決方案3
0 2014-03-04 20:55:09

解決方案4
0 2014-03-04 21:15:57

使用正則表達式解析文本

問題描述

4 個解決方案

解決方案1 1 已采納 2014-03-04 21:15:35

解決方案2 0 2014-03-04 20:51:06

解決方案3 0 2014-03-04 20:55:09

解決方案4 0 2014-03-04 21:15:57

解決方案1
1 已采納 2014-03-04 21:15:35

解決方案2
0 2014-03-04 20:51:06

解決方案3
0 2014-03-04 20:55:09

解決方案4
0 2014-03-04 21:15:57