简体   繁体   中英

C# filter String with Regex

I'm not familiar with the regex, However I think that REGEX could help me a lot to resolve my problem.

I have 2 kind of string in a big List<string> str (with or without description) :

str[0] = "[toto]";
str[1] = "[toto] descriptionToto";
str[2] = "[titi]";
str[3] = "[titi] descriptionTiti";
str[4] = "[tata]";
str[5] = "[tata] descriptionTata";

The list isn't really ordered. I would parse all my list then format datas depending on what I will find inside.

If I find: "[toto]" I would like to get to set str[0]="toto"

and If I find "[toto] descriptionToto" I would like to get to set str[1]="descriptionToto"

Do you have any ideas of the better way to get this result please ?

There are two regex options if you ask me:

  1. Make a regex pattern with two capturing groups, then use group 1 or group 2 depending on whether group 1 is empty. In this case you'd use named capturing groups to get a clear relationship between the pattern and the code

  2. Make a regex that matches string type 1 or string type 2 , in which case you would get your end result directly from regex

If you're going for speed, using str[0].IndexOf(']') would get most of the job done.

Rather than regex, I'd be inclined to just use string.split, something along the lines of:

string[] tokens = str[0].Split(new Char [] {'[', ']'});
if (tokens[2] == "") {
    str = tokens[1];
} else {
    str = tokens[2];
}

if you are planning to get just the description for those that contain description:

you can do a split at a space char - " " and store the second element of the array in str[1] which would be the description. If there's no description, a space would not exist. So do a loop and then in an array store : list.Split(' '). This will split the str with description into two elements. so:

for (int i = 0; i < str.Length; i++)
        {
           string words[] = str[i].Split(' ')
           if words.length > 1 
           {str[i] = word[1];
            }
        }

If those are code strings and not literal variable notation this should work.
The replacement just catenates capture group 1 and 2.

Find: ^\\s*(?:\\[([^\\[\\]]*)\\]\\s*|\\[[^\\[\\]]*\\]\\s*((?:\\s*\\S)+\\s*))$
Replace: "$1$2"

 ^ 
 \s* 
 (?:
      \[  
      ( [^\[\]]* )                # (1)
      \]   \s* 
   |  
      \[  [^\[\]]* \]
      \s*  
      (                           # (2 start)
           (?: \s* \S )+
           \s* 
      )                           # (2 end)
 )
 $

Dot-Net test case

 string str1 = "[titi]";
 Console.WriteLine( Regex.Replace(str1, @"^\s*(?:\[([^\[\]]*)\]\s*|\[[^\[\]]*\]\s*((?:\s*\S)+\s*))$", @"$1$2"));
 string str2 = "[titi] descriptionTiti";
 Console.WriteLine( Regex.Replace(str2, @"^\s*(?:\[([^\[\]]*)\]\s*|\[[^\[\]]*\]\s*((?:\s*\S)+\s*))$", @"$1$2"));

Output >>

 titi
 descriptionTiti

You can use single regex:

string s = Regex.Match(str[0], @"(?<=\[)[^\]]*(?=]$)|(?<=] ).*").Value;

Idea is simple: if the text is ended with ] and there is no other ] , then take everything between [ ] , otherwise take everything after first ] .

Sample code:

List<string> strList = new List<string> {
    "[toto]",
    "[toto] descriptionToto",
    "[titi]",
    "[titi] descriptionTiti",
    "[tata]",
    "[tata] descriptionTata" };
foreach(string str in strList)
    Console.WriteLine(Regex.Match(str, @"(?<=\[)[^\]]*(?=]$)|(?<=] ).*").Value);

Sample output:

toto
descriptionToto
titi
descriptionTiti
tata
descriptionTata

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM