简体   繁体   English

如何使用正则表达式解析子字符串?

[英]How I can parse substring with regular expression ?

My example non-parsed data is 我的示例非解析数据是

"8$154#3$021308831#7$NAME SURNAME#11$2166220160#10$5383237309#52$05408166#"

I want to parse data that is between $ and # strings. 我想解析$和#字符串之间的数据。 I want to see result like that; 我想看到这样的结果;

Between 8$ and # -> My data is 154 , 8$# ->我的数据是154
Between 3$ and # -> My data is 021308831 , 3$# ->我的数据是021308831
Between 7$ and # -> My data is NAME SURNAME , 7$# ->我的数据是NAME SURNAME
Between 11$ and # -> My data is 2166220160 , 11$# ->我的数据是2166220160
Between 10$ and # -> My data is 5383237309 , 10$# ->我的数据是5383237309
Between 52$ and # -> My data is 05408166 . 52$# ->我的数据是05408166

Thanks for your reply. 感谢您的回复。

(\d+\$)(.*?)#

See it on Rubular 在Rubular上看到它

You will find the first part (eg 8$ ) in the capturing group 1 and the according data in the group 2. 您将在捕获组1中找到第一部分(例如8$ ),并在组2中找到相应的数据。

The brackets are responsible, that the result is sotred in those capturing groups. 方括号负责,结果被那些捕获组分类。 The \\d+ will match at least one digit. \\d+将至少匹配一位数字。 The .*? .*? is a lazy match for everything till the next # . 直到下一个#为止,一切都是懒惰的比赛。

You can split into array based on # . 您可以根据#拆分为数组。 With

String[] entries = data.Split('#');

you will get an arrays with "8$154", "3$021308831", etc. 您将得到一个包含“ 8 $ 154”,“ 3 $ 021308831”等的数组。

Now you just work with the entries and split each one at the dollar sign: 现在,您只需要处理条目并在美元符号处分割每个条目:

String[] tmp = entries[0].Split('$');

So you get 所以你得到

tmp[0] = "8";
tmp[1] = "154";

Build in some checks and you will be happy. 建立一些检查,您会很高兴的。 No need for regex here I suppose. 我想这里不需要正则表达式。

If you have "8$15$4#3$021308831" then you will get in tmp : 如果您拥有“ 8 $ 15 $ 4#3 $ 021308831”,那么您将获得tmp

tmp[0] = "8"; // your key!
tmp[1] = "15"; // data part
tmp[2] = "4"; // data part ($ is missing!)

So you would have to concat all tmp above index 1: 因此,您必须将所有tmp置于索引1之上:

StringBuilder value = new StringBuilder();
for(int i = 1; i < tmp.Length; i++)
{
    if(i > 1) value.Append("$");
    value.Append(tmp[i]);
}
class Program
{
    static void Main(string[] args)
    {
        string text = "8$154#3$021308831#7$NAME SURNAME#11$2166220160#10$5383237309#52$05408166#";
        string[] values = text.Split('$', '#');
        for (var i = 0; i < values.Length - 1; i = i + 2)
        {
            Console.WriteLine("Between " + values[i] + "$ and # -> My data is " + values[i+1]);
        }
        Console.ReadLine();
    }
}

Ok, taking stema 's expression, which works. 好吧,以STEMA的表情为准

using System.Text.RegularExpressions;

string nonParsed = "8$...";

MatchCollection matches = Regex.Matches(nonparsed, @"(\d+\$)(.*?)#");

StringBuilder result = new StringBuilder();

for(int i = 0; i < matches.Count; i++)
{
    Match match = matches[i];

    result.AppendFormat("Between {0} and #-> My data is {1}")
        match.Groups[1].Value,
        match.Groups[2].Value);

    if (i < matches.Count - 1)
    {
        result.AppendLine(",");
    }
    else
    {
        result.Append(".");
    }
}

return result.ToString();

Thanks to stema , this copes with the $ repeating within the value. 多亏了stema ,这才可以处理$在值内重复的现象。

如果要使用正则表达式,则应这样做。

\$([\w\d\s]+)\#

这将与betweel $和#匹配:

\$(.*?)#

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM