简体   繁体   中英

Using Regex to split string by different characters based on occurance

I'm currently replacing a very old (and long) C# string parsing class that I think could be condensed into a single regex statement. Being a newbie to Regex, I'm having some issues getting it working correctly.

Description of the possible input strings:

The input string can have up to three words separated by spaces. It can stop there, or it can have an = followed by more words (any amount) separated by a comma. The words can also be contained in quotes. If a word is in quotes and has a space, it should NOT be split by the space.

Examples of input and expected output elements in the string array:

Input1: this is test Output1: {"this", "is", "test"}

Input2: this is test=param1,param2,param3

Output2: {"this", "is", "test", "param1", "param2", "param3"}

Input3: use file "c:\test file.txt"=param1, param2,param3

Output3: {"use", "file", "c:\test file.txt", "param1", "param2", "param3" }

Input4: log off

Output4: {"log", "off"}

And the most complex one:

Input5: use object "c:\test file.txt"="C:\Users\layer.shp" | ( object = 10 ),param2 use object "c:\test file.txt"="C:\Users\layer.shp" | ( object = 10 ),param2

Output5: {"use", "object", "c:\test file.txt", "C:\Users\layer.shp | ( object = 10 )", "param2"}

So to break this down:

  • I need to split by spaces up to the first three words
  • Then, if there is an = , ignore the = and split by commas instead.
  • If there are quotes around one of the first three words and contains a space, INCLUDE that space (don't split)

Here's the closest regex I've got:

\w+|"[\w\s\:\\\.]*"+([^,]+)

This seems to split the string based on spaces, and by commas after the = . However, it seems to include the = for some reason if one of the first three words is surrounded by quotes. Also, I'm not sure how to split by space only up to the first three words in the string, and the rest by comma if there is an = .

It looks like part of my solution is to use quantifiers with {} , but I've unable to set it up properly.

Without Regex. Regex should be used when string methods cannot be used. :

            string[] inputs = { 
                              "this is test",
                              "this is test=param1,param2,param3",
                              "use file \"c:\\test file.txt\"=param1 , param2,param3",
                              "log off",
                              "use object \"c:\\test file.txt\"=\"C:\\Users\\layer.shp\" | ( object = 10 ),param2"
                          };

            foreach (string input in inputs)
            {
                List<string> splitArray;
                if (!input.Contains("="))
                {
                    splitArray = input.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).ToList();
                }
                else
                {
                    int equalPosition = input.IndexOf("=");
                    splitArray = input.Substring(0, equalPosition).Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).ToList();
                    string end = input.Substring(equalPosition + 1);
                    splitArray.AddRange(end.Split(new char[] { ',' }).ToList());
                }
                string output = string.Join(",", splitArray.Select(x => x.Contains("\"") ? x : "\"" + x + "\""));
                Console.WriteLine(output);
            }
            Console.ReadLine();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM