I'm creating an application which receives HTML E-Mails.
Due to security concerns we would like to disable automatically loading images in those E-Mails.
The game plan:
src
attributes in IMG tags with data-src
attributes (using the same value) in PHPdata-src
attribute back to img
and allow the browser to allow to load the images. My problem now: I can write a basic Regex to replace the src
attribute in a simple <img src="abc.jpg" />
but I have no idea how to replace src
with data-src
in more complex and even potentially malformed html.
Am I thinking to complex? Is there an easy solution to solve this problem?
In the problematic case you would have
src=
of course, some whitespaces can be before or after the =. One might be inclined to ignore instances between apostrophes and quotes, but you cannot save then against sophisticated attacks, such as
<script type="text/javascript">
var img = '<img src="foo">';
document.body.innerHTML += img;
</script>
The best would be to parse the HTML in a browser-like environment and run
document.querySelectorAll("img[src]").length
after page load. If it's 0, then you do not have src
. However, you might still have the danger of having some CSS file linked into the HTML with a background-image
property defined.
So, I would implement a regular expression that protects against src=
and background-image:
as well (potentially with whitespaces). The best is if Javascript is not allowed to run at all and no outside-sources are allowed for CSS either.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.