如何真正将字符串拆分成字符串数组而不丢失它在C＃中的作用？

Question

What I have 是）我有的

string ImageRegPattern = @"http://[\w\.\/]*\.jpg|http://[\w\.\/]*\.png|http://[\w\.\/]*\.gif";
string a ="http://www.dsa.com/asd/jpg/good.jpgThis is a good dayhttp://www.a.com/b.pngWe are the Best friendshttp://www.c.com";

What I want 我想要的是

string[] s;
s[0] = "http://www.dsa.com/asd/jpg/good.jpg";
s[1] = "This is a good day";
s[2] = "http://www.a.com/b.png";
s[3] = "We are the Best friendshttp://www.c.com";

Bouns: BOUNS：
if the url can be splited like below, it will be better, but if not, that's ok. 如果网址可以像下面那样被分割，那会更好，但如果没有，那就没关系。

s[3] = "We are the Best friends";
s[4] = "http://www.c.com";

What's the question 问题是什么
I try to use the code below to split the string, 我尝试使用下面的代码来分割字符串，

string[] s= Regex.Split(sourceString, ImageRegPattern, RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);

But the result is not good, it seems that the Split method take out all the strings which matched the ImageRegPattern. 但结果并不好，似乎Split方法取出了与ImageRegPattern匹配的所有字符串。 But I want them to stay. 但我希望他们留下来。 I check the RegEx page on MSDN ,it seems there is no proper method to meet my need. 我检查MSDN上的RegEx页面，似乎没有适当的方法来满足我的需要。 So how to do it? 那怎么办呢？

Answer 1

You need something like this method, which finds all the matches first, and then collects them into a list along with the unmatched strings between them. 你需要这样的方法，它首先找到所有的匹配，然后将它们与它们之间不匹配的字符串一起收集到一个列表中。

UPDATE: Added conditional to handle if no matches are found. 更新：如果未找到匹配项，则添加条件以进行处理。

private static IEnumerable<string> InclusiveSplit
(
    string source, 
    string pattern
)
{
  List<string> parts = new List<string>();
  int currIndex = 0;

  // First, find all the matches. These are your separators.
  MatchCollection matches = 
      Regex.Matches(source, pattern, 
      RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);

  // If there are no matches, there's nothing to split, so just return a
  // collection with just the source string in it.
  if (matches.Count < 1)
  {
    parts.Add(source);
  }
  else
  {
    foreach (Match match in matches)
    {
      // If the match begins after our current index, we need to add the
      // portion of the source string between the last match and the 
      // current match.
      if (match.Index > currIndex)
      {
        parts.Add(source.Substring(currIndex, match.Index - currIndex));
      }

      // Add the matched value, of course, to make the split inclusive.
      parts.Add(match.Value);

      // Update the current index so we know if the next match has an
      // unmatched substring before it.
      currIndex = match.Index + match.Length;
    }

    // Finally, check is there is a bit of unmatched string at the end of the 
    // source string.
    if (currIndex < source.Length)
      parts.Add(source.Substring(currIndex));
  }

  return parts;
}

The output for your example input will be like so: 示例输入的输出将如下所示：

[0] "http://www.dsa.com/asd/jpg/good.jpg"
[1] "This is a good day"
[2] "http://www.a.com/b.png"
[3] "We are the Best friendshttp://www.c.com"

Answer 2

One does not simply underestimate the power of regex : 一个人不会简单地低估正则表达式的力量：

(.*?)([AZ][\\w\\s]+(?=http|$))

Explanation: 说明：

(.*?) : group and match everything until capital letter found, in this group you'll find the url (.*?) ：分组并匹配所有内容，直到找到大写字母，在此组中您将找到该网址
( : start group ( ：开始组
- [AZ] : match one capital letter [AZ] ：匹配一个大写字母
- [\\w\\s]+ : match any character of az, AZ, 0-9, _, \\n, \\r, \\t, \\f " " 1 or more times [\\w\\s]+ ：匹配az，AZ，0-9，_，\\ n，\\ r，\\ t，\\ t，“f”的任何字符1次或多次
- (?=http|$) : lookahead, check if what follows is http or end of line (?=http|$) ：lookahead，检查后面是http还是行尾
- ) : close group (here you'll find the text) ) ：关闭组（在这里你会找到文字）

Online demo 在线演示

_{Note: This solution is for matching the string, not splitting it.} _{注意：此解决方案用于匹配字符串，而不是将其拆分。}

Answer 3

I think you need a multi-step process to insert a delimiter that can then be used by the String.Split command: 我认为你需要一个多步骤的过程来插入一个分隔符，然后可以被String.Split命令使用：

resultString = Regex.Replace(rawString, @"(http://.*?/\w+\.(jpg|png|gif))", "|$1|", RegexOptions.IgnoreCase);
if (a.StartsWith("|")
   a = a.Substring(1);
string a = resultString.Split('|');

Answer 4

The obvious answer here is of course not to use split, but rather matching the image patterns and retrieving them. 这里显而易见的答案当然不是使用拆分，而是匹配图像模式并检索它们。 That being said, it's not impossible to use split. 话虽如此，使用拆分并非不可能。

string ImageRegPattern = @"(?=(http://[\w./]*?\.jpg|http://[\w./]*?\.png|http://[\w./]*?\.gif))|(?<=(\.jpg|\.png|\.gif))"

This will match any point in the string that is either followed by an image url, or a point that is preceeded by .jpg , .gif or .png . 这将匹配字符串中后跟图像网址的任何点，或者以.jpg ， .gif或.png开头的点。

I really don't recommend doing it this way, I'm just saying you can. 我真的不建议这样做，我只是说你可以。

如何真正将字符串拆分成字符串数组而不丢失它在C＃中的作用？

问题描述

4 个解决方案

解决方案1
4 已采纳 2013-05-29 19:01:06

解决方案2
1 2013-05-29 19:15:54

解决方案3
0 2013-05-29 18:59:53

解决方案4
0 2013-05-29 18:59:53

如何真正将字符串拆分成字符串数组而不丢失它在C＃中的作用？

问题描述

4 个解决方案

解决方案1 4 已采纳 2013-05-29 19:01:06

解决方案2 1 2013-05-29 19:15:54

解决方案3 0 2013-05-29 18:59:53

解决方案4 0 2013-05-29 18:59:53

解决方案1
4 已采纳 2013-05-29 19:01:06

解决方案2
1 2013-05-29 19:15:54

解决方案3
0 2013-05-29 18:59:53

解决方案4
0 2013-05-29 18:59:53