What regex to use in C# to start matching from a word BEHIND (matching backwards...) until a match?

Question

Let's say a code in HTML:

<a href="http://google.com">this is a search engine</a>"

How to look for "engine" and match anything until "this" gets reached?

I know I can do: this.*?engine - but this is from left to right matching, that is "ahead" matching, here I want to read backwards if this is possible at all?

Answer 1

You could reverse all strings and perform normal search:

string text = @"<a href=""http://google.com""> this is a search engine </a>";
string engine = "engine";
string strThis = "this";

new string(
  Regex.Match(
    new string(text.Reverse().ToArray()),
    new string(engine.Reverse().ToArray()) + ".+" + new string(strThis.Reverse().ToArray()))
 .Value
 .Reverse()
 .ToArray())

Also, to make code clearer, you could define extension method on a string , which reverses string and returns string instead of IEnumerable<char> . See this for reference.

Answer 2

First, always parse HTML with a dedicated tool, see What is the best way to parse html in C#? for possible options.

Once the HTML is parsed you can get plain text to run your regex against.

You may still use your this.*?engine regex but enable RegexOptions.RightToLeft option, possibly coupled with RegexOptions.Singleline to match really any chars between the two words:

var result = Regex.Match(text, @"this.*?engine", RegexOptions.Singleline | RegexOptions.RightToLeft)?.Value;

See the online regex demo .

As per the documentation, RegexOptions.RightToLeft

Gets a value that indicates whether the regular expression searches from right to left.

C# demo :

var text = "blah blah this is a this search engine blah";
var result = Regex.Match(text, @"this.*?engine", 
        RegexOptions.Singleline | RegexOptions.RightToLeft)?.Value;
Console.WriteLine(result); // => this search engine

What regex to use in C# to start matching from a word BEHIND (matching backwards...) until a match?

Question

2 answers

solution1
0 ACCPTED 2019-12-12 08:04:04

solution2
0 2019-12-19 08:51:14

What regex to use in C# to start matching from a word BEHIND (matching backwards...) until a match?

Question

2 answers

solution1 0 ACCPTED 2019-12-12 08:04:04

solution2 0 2019-12-19 08:51:14

solution1
0 ACCPTED 2019-12-12 08:04:04

solution2
0 2019-12-19 08:51:14