简体   繁体   中英

simple rookie regex help needed

i have a simple regex expression below to pull out the value within a string that is surrounded by end**end, example below. However, although it's stupidly simple im struggling to get the results I need! Is there something obvious I'm missing! Many thanks as always.

var str = "endhelloend";
var match = Regex.Match(str, @"end([a-z]+)end$", RegexOptions.IgnoreCase);

if(match.Success)
{
    result = match.Groups[0].Value  // should return 'hello'
}

Your pattern correctly contains the group you want to extract. A regular expression match will contain a collection of groups for you to access. In your example, try the following:

var str = "endhelloend";
var match = Regex.Match(str, @"end([a-z]+)end$", RegexOptions.IgnoreCase);

if(match.Success)
{
    var hello = match.Groups[1];
}

match.Groups[0] will return the entire match "endhelloend" so you just want the 1st group within the match.

match.Groups [0]将匹配整个正则表达式-查看match.Groups [1]。

我认为这一行应如下所示: result = match.Groups[1].Value;

I see you're struggling with this so I will offer a little insight.

This regex end([az]+)end$ will match this string " endhelloend ".
The inner text will be in capture group 1.
It will not match the same string when its a substring like this
" endhelloend of the world ".

The reason is you have an end of string metachar (assertion) $ as part of the regex
just after 'end'.

So you could just take out $ in the regex and it should work fine.
There are other things to take into account though. I'll comment it in you're regex.

end        // find a literal 'end'
(          // Capture group 1 open
  [a-z]+   // Find as many characters a-z as possible (including 'e' 'n' 'd' ins sequence
)          // Capture group 1 close
end        // find a literal 'end'
$          // End of string assertion (the last 'end' must be the last word in the string)

Use solution 1 to extract .html text content and then filter your desired text from text by using solution 2 .

  1. To clean html elements within .htm file, try this:

     string CleanXml(string DirtyXml) { //string clean = ""; int startloc = 0, endloc = 0; for (int x = 0; x <= DirtyXml.Length-1; x++) { if (DirtyXml[x] == '<') { startloc = x; x++; } if (DirtyXml[x] == '>') { endloc = x; x++; DirtyXml = DirtyXml.Remove(startloc, (endloc - startloc)+1); x = 0; } } return DirtyXml; } 
  2. Regex to filter text "endhelloend" to obtain "hello" 在此处输入图片说明

      string result = ""; var str = "endhelloend"; var match = Regex.Match(str, @"end([az]+)end$", RegexOptions.IgnoreCase); if(match.Success) { result = match.Groups[1].Value; // Returns 'hello' } Console.WriteLine(result); Console.ReadLine(); 

尝试此操作,它将为您提供单词end之间的任何字母字符,但不会捕获实际的单词end

(?<=end)[a-z]+?(?=end)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM