简体   繁体   中英

Regex for capturing values in a delimited list

I'm trying to write a regex that will extract clean values from a delimited list. The catch is that the list could be delimited by different symbols or words. The captured values will be trimmed in the code, so spaces don't matter.

Input:

English (UK), French* , German and Polish  & Russian; Portugese and Italian

Regex I have so far:

\A(?:(?<Value>[^,;&*]+)[,;&\s*]*)*\Z

The delimiters I'm expecting are ,;& . I included the * because I want it excluded from the captured value.

Captured values:

English (UK), French, German and Polish, Russian, Portugese and Italian

Expected values:

English (UK), French, German, Polish, Russian, Portugese, Italian

The problem I have is that I can't get and to be treated as a delimiter.

I think it is not necessary to use Regex here:

    string str = "English (UK), French* , German and Polish  & Russian; Portugese and Italian";
    string[] results = str.Split(new string[] { ",", ";", "&", "*" }, StringSplitOptions.RemoveEmptyEntries);
    foreach (string s in results)
        if (!string.IsNullOrWhiteSpace(s))
            Console.WriteLine(s);

This is what I came up with:

\A(?:(?<Value>(?:[^,;&*\s]|\s(?!and))+)(?:(?:and|[,;&\s*])*))*\Z

Explanation:

(?:...) is a non-capturing group, not changing the match, just not storing the result in a group.

(?!...) is negative lookahead, matching if the characters following don't match the given pattern.

Basically this only matches white-space as part of Value if "and" doesn't follow it, and it includes "and" in the separator.

This seems awfully complicated, you may want replace " and " with a separator and use your current expression.

Test .

或者只对你当前的结果这样做:

desiredResult = currentResult.Replace("and", ",");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM