简体   繁体   中英

Replace All 'src'-attributes in HTML with 'data-src' attributes in PHP

I'm creating an application which receives HTML E-Mails.

Due to security concerns we would like to disable automatically loading images in those E-Mails.

The game plan:

  1. Sanitization
  2. Replace all src attributes in IMG tags with data-src attributes (using the same value) in PHP
  3. Add Button to Frontend
  4. On Click change data-src attribute back to img and allow the browser to allow to load the images.

My problem now: I can write a basic Regex to replace the src attribute in a simple <img src="abc.jpg" /> but I have no idea how to replace src with data-src in more complex and even potentially malformed html.

Am I thinking to complex? Is there an easy solution to solve this problem?

In the problematic case you would have

src=

of course, some whitespaces can be before or after the =. One might be inclined to ignore instances between apostrophes and quotes, but you cannot save then against sophisticated attacks, such as

<script type="text/javascript">
    var img = '<img src="foo">';
    document.body.innerHTML += img;
</script>

The best would be to parse the HTML in a browser-like environment and run

document.querySelectorAll("img[src]").length

after page load. If it's 0, then you do not have src . However, you might still have the danger of having some CSS file linked into the HTML with a background-image property defined.

So, I would implement a regular expression that protects against src= and background-image: as well (potentially with whitespaces). The best is if Javascript is not allowed to run at all and no outside-sources are allowed for CSS either.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM