简体   繁体   English

从纯文本字符串中提取数据

[英]Extracting data from plain text string

I am trying to process a report from a system which gives me the following code 我正在尝试处理来自系统的报告,该系统为我提供以下代码

000=[GEN] OK {Q=1 M=1 B=002 I=3e5e65656-e5dd-45678-b785-a05656569e}

I need to extract the values between the curly brackets {} and save them in to variables. 我需要提取大括号{}之间的值并将其保存到变量中。 I assume I will need to do this using regex or similar? 我认为我将需要使用正则表达式或类似工具来执行此操作? I've really no idea where to start!! 我真的不知道从哪里开始! I'm using c# asp.net 4. 我正在使用c#asp.net 4。

I need the following variables 我需要以下变量

param1 = 000
param2 = GEN
param3 = OK
param4 = 1 //Q
param5 = 1 //M
param6 = 002 //B
param7 = 3e5e65656-e5dd-45678-b785-a05656569e //I

I will name the params based on what they actually mean. 我将根据其实际含义来命名这些参数。 Can anyone please help me here? 有人可以在这里帮我吗? I have tried to split based on spaces, but I get the other garbage with it! 我试图根据空间进行拆分,但是我得到了其他垃圾!

Thanks for any pointers/help! 感谢您的任何指示/帮助!

If the format is pretty constant, you can use .NET string processing methods to pull out the values, something along the lines of 如果格式相当恒定,则可以使用.NET字符串处理方法提取值,类似于

string line = 
    "000=[GEN] OK {Q=1 M=1 B=002 I=3e5e65656-e5dd-45678-b785-a05656569e}";

int start = line.IndexOf('{');
int end = line.IndexOf('}');
string variablePart = line.Substring(start + 1, end - start);
string[] variables = variablePart.Split(' ');
foreach (string variable in variables)
{
    string[] parts = variable.Split('=');
    // parts[0] holds the variable name, parts[1] holds the value
}

Wrote this off the top of my head, so there may be an off-by-one error somewhere. 把它写在我的头顶上,所以某个地方可能会有一个错误的错误。 Also, it would be advisable to add error checking eg to make sure the input string has both a { and a }. 此外,建议添加错误检查,例如确保输入字符串同时包含{和}。

Use a regular expression. 使用正则表达式。

Quick and dirty attempt: 快速而肮脏的尝试:

(?<ID1>[0-9]*)=\[(?<GEN>[a-zA-Z]*)\] OK {Q=(?<Q>[0-9]*) M=(?<M>[0-9]*) B=(?<B>[0-9]*) I=(?<I>[a-zA-Z0-9\-]*)}

This will generate named groups called ID1 , GEN , Q , M , B and I . 这将生成名为ID1GENQMBI命名组。

Check out the MSDN docs for details on using Regular Expressions in C#. 请查阅MSDN文档,以获取有关在C#中使用正则表达式的详细信息。

You can use Regex Hero for quick C# regex testing. 您可以使用Regex Hero进行快速的C#regex测试。

I would suggest a regular expression for this type of work. 我建议为此类工作使用正则表达式。

var objRegex = new System.Text.RegularExpressions.Regex(@"^(\d+)=\[([A-Z]+)\] ([A-Z]+) \{Q=(\d+) M=(\d+) B=(\d+) I=([a-z0-9\-]+)\}$");
var objMatch = objRegex.Match("000=[GEN] OK {Q=1 M=1 B=002 I=3e5e65656-e5dd-45678-b785-a05656569e}");
if (objMatch.Success)
{
    Console.WriteLine(objMatch.Groups[1].ToString());
    Console.WriteLine(objMatch.Groups[2].ToString());
    Console.WriteLine(objMatch.Groups[3].ToString());
    Console.WriteLine(objMatch.Groups[4].ToString());
    Console.WriteLine(objMatch.Groups[5].ToString());
    Console.WriteLine(objMatch.Groups[6].ToString());
    Console.WriteLine(objMatch.Groups[7].ToString());
}

I've just tested this out and it works well for me. 我刚刚测试了一下,对我来说效果很好。

You can use String.Split 您可以使用String.Split

string[] parts = s.Split(new string[] {"=[", "] ", " {Q=", " M=", " B=", " I=", "}"},
                         StringSplitOptions.None);

This solution breaks up your report code into segments and stores the desired values into an array. 此解决方案将您的报告代码分解为多个段,并将所需的值存储到数组中。

The regular expression matches one report code segment at a time and stores the appropriate values in the "Parsed Report Code Array". 正则表达式一次匹配一个报告代码段,并将适当的值存储在“已分析的​​报告代码数组”中。

As your example implied, the first two code segments are treated differently than the ones after that. 如您的示例所示,前两个代码段的处理方式与之后的代码段不同。 I made the assumption that it is always the first two segments that are processed differently. 我假设总是前两个段被不同地处理。

private static string[] ParseReportCode(string reportCode) {
    const int FIRST_VALUE_ONLY_SEGMENT = 3;
    const int GRP_SEGMENT_NAME = 1;
    const int GRP_SEGMENT_VALUE = 2;
    Regex reportCodeSegmentPattern = new Regex(@"\s*([^\}\{=\s]+)(?:=\[?([^\s\]\}]+)\]?)?");
    Match matchReportCodeSegment = reportCodeSegmentPattern.Match(reportCode);

    List<string> parsedCodeSegmentElements = new List<string>();
    int segmentCount = 0;
    while (matchReportCodeSegment.Success) {
        if (++segmentCount < FIRST_VALUE_ONLY_SEGMENT) {
            string segmentName = matchReportCodeSegment.Groups[GRP_SEGMENT_NAME].Value;
            parsedCodeSegmentElements.Add(segmentName);
        }
        string segmentValue = matchReportCodeSegment.Groups[GRP_SEGMENT_VALUE].Value;
        if (segmentValue.Length > 0) parsedCodeSegmentElements.Add(segmentValue);
        matchReportCodeSegment = matchReportCodeSegment.NextMatch();
    }
    return parsedCodeSegmentElements.ToArray();
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM