Let's say a code in HTML:
<a href="http://google.com">this is a search engine</a>"
How to look for "engine" and match anything until "this" gets reached?
I know I can do: this.*?engine
- but this is from left to right matching, that is "ahead" matching, here I want to read backwards if this is possible at all?
You could reverse all strings and perform normal search:
string text = @"<a href=""http://google.com""> this is a search engine </a>";
string engine = "engine";
string strThis = "this";
new string(
Regex.Match(
new string(text.Reverse().ToArray()),
new string(engine.Reverse().ToArray()) + ".+" + new string(strThis.Reverse().ToArray()))
.Value
.Reverse()
.ToArray())
Also, to make code clearer, you could define extension method on a string
, which reverses string and returns string
instead of IEnumerable<char>
. See this for reference.
First, always parse HTML with a dedicated tool, see What is the best way to parse html in C#? for possible options.
Once the HTML is parsed you can get plain text to run your regex against.
You may still use your this.*?engine
regex but enable RegexOptions.RightToLeft
option, possibly coupled with RegexOptions.Singleline
to match really any chars between the two words:
var result = Regex.Match(text, @"this.*?engine", RegexOptions.Singleline | RegexOptions.RightToLeft)?.Value;
See the online regex demo .
As per the documentation, RegexOptions.RightToLeft
Gets a value that indicates whether the regular expression searches from right to left.
C# demo :
var text = "blah blah this is a this search engine blah";
var result = Regex.Match(text, @"this.*?engine",
RegexOptions.Singleline | RegexOptions.RightToLeft)?.Value;
Console.WriteLine(result); // => this search engine
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.