简体   繁体   English

使用RegEx或类似方法解析格式化字符串

[英]Parsing a formatted string with RegEx or similar

I have an application which sends a TCP message to a server, and gets one back. 我有一个应用程序,它将TCP消息发送到服务器,并返回一个。

The message it gets back is in this format: 它获取的消息采用以下格式:

0,"120"1,"Data Field 1"2,"2401"3,"Data Field 3"1403-1,"multiple occurence 1"1403-2,"multiple occurence 2"99,"" 0,“120”1,“数据字段1”2,“2401”3,“数据字段3”1403-1,“多次出现1”1403-2,“多次出现2”99,“”

So basically it is a set of fields concatenated together. 所以基本上它是一组连接在一起的字段。
Each field has a tag, a comma, and a value - in that order. 每个字段都有一个标记,一个逗号和一个值 - 按此顺序排列。
The tag is the number, the value is in quotes, the comma seperates them. 标签是数字,值是引号,逗号分隔它们。
0,"120" 0, “120”
0 is the tag, 120 is the value. 0是标签,120是值。

A complete message always starts with a 0 field and ends with 99,"" field. 完整的消息始终以0字段开头,以99,“”字段结束。

To complicate things, some tags have dashes because they are split into more than 1 value. 更复杂的是,某些标签有破折号,因为它们被分成多个值。
The order of the numbers is not significant. 数字的顺序并不重要。

(For reference, this is a "Fedex Tagged Transaction" message). (作为参考,这是“Fedex Tagged Transaction”消息)。

So I'm looking for a decent way of validating that we have a "complete" message (ie has the 0 and 99 fields) - because it's from a TCP message I guess I have to account for not having received the full message yet. 所以我正在寻找一种合适的方式来验证我们有一个“完整”的消息(即有0和99字段) - 因为它来自TCP消息,我想我必须考虑到还没有收到完整的消息。
Then splitting it up to get all the values I need. 然后拆分它以获得我需要的所有值。

The best I have come up with is for parsing is some poor regex and some cleaning-up afterwards. 我提出的最好的解决方法是解析一些糟糕的正则表达式,然后进行一些清理。 The heart of it is this: (\\d?\\d?\\d?\\d?-?\\d?\\d,") to split it 它的核心是:( \\ d?\\ d?\\ d?\\ d? - ?\\ d?\\ d,“)将它拆分

string s = @"(\d?\d?\d?\d?-?\d?\d,"")";
string[] strArray = Regex.Split(receivedData, r);

Assert.AreEqual(14, strArray.Length, "Array length should be 14", since we have 7 fields.);

Dictionary<string, string> fields = new Dictionary<string, string>();

//Now put it into a dictionary which should be easier to work with than an array
for (int i = 0; i <= strArray.Length-2; i+=2)
{
    fields.Add(strArray[i].Trim('"').Trim(','), strArray[i + 1].Trim('"'));
}

Which doesn't really work. 哪个不起作用。
It has a lot of quotes and commas left over, and doesn't seem particularly well-formed... 它留下了很多引号和逗号,似乎没有特别好的形式......
I'm not good with Regex so I can't put together what I need it to do. 我对Regex并不擅长,所以我不能把我需要做的事情放在一起。

I don't even know if it is the best way. 我甚至都不知道这是不是最好的方法。

Any help appreciated. 任何帮助赞赏。

Try this expression 试试这个表达

\d*(-\d*)?,"[^"]*"

Match count: 7 比赛数:7

0,"120"
1,"Data Field 1"
2,"2401"
3,"Data Field 3"
1403-1,"multiple occurence 1"
1403-2,"multiple occurence 2"
99,""

I suggest you use Regex.Matches rather than Regex.Split. 我建议你使用Regex.Matches而不是Regex.Split。 This way you can iterate over all the matches, and use capture groups to just grab the data you want directly, while still maintaining structure. 这样,您可以迭代所有匹配,并使用捕获组直接获取您想要的数据,同时仍然保持结构。 I provided a regex that should work for this below in the example: 我提供了一个正则表达式,在下面的示例中应该适用于此:

        MatchCollection matchlist = Regex.Matches(receivedData, @"(?<tag>\d+(?:-\d+)?),""(?<data>.*?)""");
        foreach (Match match in matchlist)
        {
            string tag = match.Groups["tag"].Value;
            string data = match.Groups["data"].Value;
        }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM