简体   繁体   中英

What is the best way to parse this string in C#?

I have a string that I am reading from another system. It's basically a long string that represents a list of key value pairs that are separated by a space in between. It looks like this:

 key:value[space]key:value[space]key:value[space]

So I wrote this code to parse it:

string myString = ReadinString();
string[] tokens = myString.split(' ');
foreach (string token in tokens) {
     string key = token.split(':')[0];
     string value = token.split(':')[1];
     .  . . . 
}

The issue now is that some of the values have spaces in them so my "simplistic" split at the top no longer works. I wanted to see how I could still parse out the list of key value pairs (given space as a separator character) now that I know there also could be spaces in the value field as split doesn't seem like it's going to be able to work anymore.

NOTE: I now confirmed that KEYs will NOT have spaces in them so I only have to worry about the values. Apologies for the confusion.

Use this regular expression:

\w+:[\w\s]+(?![\w+:])

I tested it on

test:testvalue test2:test value test3:testvalue3

It returns three matches:

test:testvalue
test2:test value
test3:testvalue3

You can change \w to any character set that can occur in your input.

Code for testing this:

var regex = new Regex(@"\w+:[\w\s]+(?![\w+:])");
var test = "test:testvalue test2:test value test3:testvalue3";

foreach (Match match in regex.Matches(test))
{
    var key = match.Value.Split(':')[0];
    var value = match.Value.Split(':')[1];

    Console.WriteLine("{0}:{1}", key, value);
}
Console.ReadLine();

As Wonko the Sane pointed out, this regular expression will fail on values with : . If you predict such situation, use \w+:[\w: ]+?(?:[\w+:]) as the regular expression. This will still fail when a colon in value is preceded by space though... I'll think about solution to this.

This cannot work without changing your split from a space to something else such as a "|".

Consider this:

Alfred Bester: Alfred Bester Alfred :Alfred Bester

  • Is this Key "Alfred Bester" & value Alfred" or Key "Alfred" & value "Bester Alfred"?
string input = "foo:Foobarius Maximus Tiberius Kirk bar:Barforama zap:Zip Brannigan";

foreach (Match match in Regex.Matches(input, @"(\w+):([^:]+)(?![\w+:])"))
{
   Console.WriteLine("{0} = {1}", 
       match.Groups[1].Value, 
       match.Groups[2].Value
      );
}

Gives you:

foo = Foobarius Maximus Tiberius Kirk
bar = Barforama
zap = Zip Brannigan

You could try to Url encode the content between the space ( The keys and the values not the: symbol ) but this would require that you have control over the Input Method.

Or you could simply use another format (Like XML or JSON), but again you will need control over the Input Format.

If you can't control the input format you could always use a Regular expression and that searches for single spaces where a word plus: follows.

Update (Thanks Jon Grant) It appears that you can have spaces in the key and the value. If this is the case you will need to seriously rethink your strategy as even Regex won't help.

string input = "key1:value key2:value key3:value";
Dictionary<string, string> dic = input.Split(' ').Select(x => x.Split(':')).ToDictionary(x => x[0], x => x[1]);

The first will produce an array:

"key:value", "key:value"

Then an array of arrays:

{ "key", "value" }, { "key", "value" }

And then a dictionary:

"key" => "value", "key" => "value"

Note, that Dictionary<K,V> doesn't allow duplicated keys, it will raise an exception in such a case. If such a scenario is possible, use ToLookup() .

Using a regular expression can solve your problem:

private void DoSplit(string str)
{
    str += str.Trim() + " ";
    string patterns = @"\w+:([\w+\s*])+[^!\w+:]";
    var r = new System.Text.RegularExpressions.Regex(patterns);
    var ms = r.Matches(str);
    foreach (System.Text.RegularExpressions.Match item in ms)
    {
        string[] s = item.Value.Split(new char[] { ':' });
        //Do something
    }
}

I guess you could take your method and expand upon it slightly to deal with this stuff...

Kind of pseudocode:

List<string> parsedTokens = new List<String>();
string[] tokens = myString.split(' ');
for(int i = 0; i < tokens.Length; i++)
{
    // We need to deal with the special case of the last item, 
    // or if the following item does not contain a colon.
    if(i == tokens.Length - 1 || tokens[i+1].IndexOf(':' > -1)
    {
        parsedTokens.Add(tokens[i]);
    }
    else
    {
        // This bit needs to be refined to deal with values with multiple spaces...
        parsedTokens.Add(tokens[i] + " " + tokens[i+1]);
    }
}

Another approach would be to split on the colon... That way, your first array item would be the name of the first key, second item would be the value of the first key and then name of the second key (can use LastIndexOf to split it out), and so on. This would obviously get very messy if the values can include colons, or the keys can contain spaces, but in that case you'd be pretty much out of luck...

This code will do it (given the rules below). It parses the keys and values and returns them in a Dictonary<string, string> data structure. I have added some code at the end that assumes given your example that the last value of the entire string/stream will be appended with a [space]:

private Dictionary<string, string> ParseKeyValues(string input)
        {
            Dictionary<string, string> items = new Dictionary<string, string>();

            string[] parts = input.Split(':');

            string key = parts[0];
            string value;

            int currentIndex = 1;

            while (currentIndex < parts.Length-1)
            {
                int indexOfLastSpace=parts[currentIndex].LastIndexOf(' ');
                value = parts[currentIndex].Substring(0, indexOfLastSpace);
                items.Add(key, value);
                key = parts[currentIndex].Substring(indexOfLastSpace + 1);
                currentIndex++;
            }
            value = parts[parts.Length - 1].Substring(0,parts[parts.Length - 1].Length-1);


            items.Add(key, parts[parts.Length-1]);

            return items;

        }

Note: this algorithm assumes the following rules:

  1. No spaces in the values
  2. No colons in the keys
  3. No colons in the values

Without any Regex nor string concat, and as an enumerable (it supposes keys don't have spaces, but values can):

    public static IEnumerable<KeyValuePair<string, string>> Split(string text)
    {
        if (text == null)
            yield break;

        int keyStart = 0;
        int keyEnd = -1;
        int lastSpace = -1;
        for(int i = 0; i < text.Length; i++)
        {
            if (text[i] == ' ')
            {
                lastSpace = i;
                continue;
            }

            if (text[i] == ':')
            {
                if (lastSpace >= 0)
                {
                    yield return new KeyValuePair<string, string>(text.Substring(keyStart, keyEnd - keyStart), text.Substring(keyEnd + 1, lastSpace - keyEnd - 1));
                    keyStart = lastSpace + 1;
                }
                keyEnd = i;
                continue;
            }
        }
        if (keyEnd >= 0)
            yield return new KeyValuePair<string, string>(text.Substring(keyStart, keyEnd - keyStart), text.Substring(keyEnd + 1));
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM