简体   繁体   English

从字符串中提取 substring 列表并从字符串中删除

[英]Extract list of substring from string and remove from string

I have a string and must extract all the substrings between 2 different words ('alpha' and 'beta').我有一个字符串,必须提取 2 个不同单词('alpha' 和 'beta')之间的所有子字符串。 I must return a json with two fields.我必须返回带有两个字段的 json。

I tried in this way but it doesn't work correctly:我以这种方式尝试过,但它不能正常工作:

            string content = "string working on";
            var listSubString = new List<string>();
            int index = 0;
            do
            {
                index = content.LastIndexOf("alpha");
                if (index != -1)
                {
                    var length = content.IndexOf("beta");
                    string substring= content.Substring(index, length);
                    content = content.Replace(substring, string.Empty);
                    listSubString.Add(substring.Replace("alpha", string.Empty).Replace("beta", string.Empty));
                }
            } while (index != -1);
Content = content;
ListSubString = listSubString;

I'd like with a string like "hello alpha I don't want this part 1 beta world alpha i don't want this part 2 beta have a nice day" receive a json like {Content: "hello world have a nice day, ListSubString: ["i don't want this part 1", "i don't want this part 2"]}我想要一个像"hello alpha I don't want this part 1 beta world alpha i don't want this part 2 beta have a nice day"这样的字符串接收一个 json 像{Content: "hello world have a nice day, ListSubString: ["i don't want this part 1", "i don't want this part 2"]}

Thanks for the help谢谢您的帮助

I got the output you want, Hope it solves your purpose我得到了你想要的 output,希望它能解决你的目的

Link: https://dotnetfiddle.net/b8T8qy链接: https://dotnetfiddle.net/b8T8qy

This code part i changed in your code我在您的代码中更改了此代码部分

string substring= content.Substring(index, length-1);
listSubString.Insert(0, substring.Replace("alpha", string.Empty).Replace("beta", string.Empty));

And finally i appended the output result as json String.最后我将 output 结果附加为 json 字符串。

string json = string.Join("\",\"", listSubString);
string otp = "{\"Content\" : \""+content+"\",\"ListSubString\": [\""+json+"\"]}";

Output: Output:

{
  "Content": "hello world have a nice day",
  "ListSubString": [
    " I don't want this part 1  ",
    " i don't want this part 2  "
  ]
}

Regular expressions allow you to accomplish this without indexes and loops. 正则表达式允许您在没有索引和循环的情况下完成此操作。

Once you have identified a pattern that describes the substrings you are looking to extract eg "alpha.*?beta" , then rebuilding the content without said substrings is just a matter of concatenating the fragments split by a regular expression:一旦您确定了描述您要提取的子字符串的模式,例如"alpha.*?beta" ,那么在没有所述子字符串的情况下重建内容只是连接由正则表达式拆分的片段的问题:

Content = string.Join(string.Empty, new Regex("alpha.*?beta").Split(text);

As per the substrings themselves, you can capture them in the pattern and extract them from the matches returned by the regular expression:根据子字符串本身,您可以在模式中捕获它们并从正则表达式返回的匹配中提取它们:

ListSubString = new Regex("alpha(.*?)beta")
    .Matches(text)
    .Select(match => match.Groups[1])
    .SelectMany(group => group.Captures.OfType<Capture>())
    .Select(capture => capture.Value)
    .ToList();

You can have a look at this answer for some clarification on the Match > Group > Capture hierarchy.您可以查看此答案以了解有关Match > Group > Capture层次结构的一些说明。

Thanks to everyone who responded to me.感谢所有回复我的人。 In the end, I decided to do it in this way:最后,我决定这样做:

var splitedContent = content.Split(new string[] { "alpha", "beta" }, StringSplitOptions.None);

Content = string.Join(" ", splitedContent.Where((_, index) => index % 2 == 0));
Css = splitedContent.Where((_, index) => index % 2 != 0).ToList<string>();

Regex probably was the best and most performant solution but I don't get how it works perfectly so at the moment this is my solution.正则表达式可能是最好和最高效的解决方案,但我不明白它是如何完美运行的,所以目前这是我的解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM