简体   繁体   中英

Append querystring to img tags within a string

I have a string like so:

<p>1</p><p><img src="https://somesite/1.png?x=1&y=2"></p>
<p>2</p><p><img src="https://somesite/2.png?x=1&y=2"></p>
<p>3</p><p><img src="https://somesite/3.png?x=1&y=2"></p>

It is the result of Kendo UI's editor .

I would like all image src's to be appended with a tick something like &tick=2342342343 (because I'm trying to overcome a caching issue like this one from another stackoverflow )

So that the output would look like this:

<p>1</p><p><img src="https://somesite/1.png?x=1&y=2&tick=2342342343"></p> 
<p>2</p><p><img src="https://somesite/2.png?x=1&y=2&tick=2342342343"></p>
<p>3</p><p><img src="https://somesite/3.png?x=1&y=2&tick=2342342343"></p>

I think reg expression might be a good start:

var img = "img";
var imgRegExp = "<img src=\"[^\"]*\">";
Regex re = new Regex(imgRegExp);    
if (editorText!=null && editorText.Contains(img))
{
    //replace each editorText
}

In the end I opted for regex. The HTML parsing wasn't working out as @Wiktor-Stribiżew pointed out - I'm using an editor to generate the few tags.

private static void AppendQueryStringToIMG()
    {
        string output = "<p>1</p><p><img src=\"https://a_dynamic_environment.file.core.windows.net/some-proj/my-images/img__1.png?x=123&y=234\"></p><p>2</p><p><img src=\"https://a_dynamic_environment.file.core.windows.net/some-proj/my-images/img__2.png?x=123&y=234\"></p><p>3</p><p><img src=\"https://a_dynamic_environment.file.core.windows.net/some-proj/my-images/img__3.png?x=123&y=234\"></p>";
        
        if (output != null && output.Contains("img"))
        {
            var m = Regex.Match(output, "<img .*?src=\\\"(.*?)\\\"");

            while (m.Success)
            {
                var href = m.Groups[1].Value;
                output = output.Replace(href, href + "&ticks=" + DateTimeOffset.UtcNow.Ticks);

                m = m.NextMatch();
            }
        }

        //output:
        //string with &tick=1231231231 at the end of each img
    }

I join the comments saying that a the HTML may change and a regex might suddenly not work anymore if the HTML output changes. But sometimes a regex is far more efficient than loading a complete parser. So it depends on the risk of changes and if it's the case, do you have control on these changes? (updates of Kendo UI, etc)

For a regex solution, why not having a go with this: https://regex101.com/r/nJ3CL8/1

You can generate the code directly from the regex101 saved example.

My thoughts for a quick solution:

  • I keep in mind that some spaces can be all around. Yes, even around the = sign!
  • Use case insensitive flag as it could be <IMG Src="..." />
  • They can be any type of other attribute between img and src so capture it too.
  • An attribute can be surounded by single or double quotes, and even nothing, I didn't take in consideration this case as it's usually not the case, typically for a src attribute.

The pattern and the substitution strings would be this in C#:

string pattern = @"<\s*img\s*([^>]*?)src\s*=\s*([""'])(.*?)\2";

string substitution = @"<img \1src=\2\3&tick=123456789\2";

Explanation:

  • \s* means any spaces, 0 or multiple times.
  • [^>]*? means any char except > 0 or more times, but ungreedy (not searching to far).
  • ([^>]*?) is to capture these attributes before the src attribute. It's capture n°1 => \1 in replacement pattern.
  • (["']) is to capture the single or double quote. It's capture n°2 => re-used later.
  • (.*?) captures the src value in ungreedy way. It only works because I used the \2 backreference of the single/double quote.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM