简体   繁体   中英

php obfuscating mailto in source with htmlentities()

I am attempting to display email addresses on a page that function normally in a browser, but are obfuscated in code to hopefully get at least some spam bots to ignore them.

I have this test code:

<?php
$email = "fake@test.com";
$mailto = "mailto:" . $email;
?>
<html>
<head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /></head>
<body>
<p>PHP: <a href="<?php echo htmlentities($mailto); ?>"><?php echo htmlentities($email); ?></a></p>
<p>&nbsp;</p>
<p>MANUAL: <a href="&#109;&#x61;&#105;&#108;&#116;&#x6f;&#58;&#102;&#x61;&#x6b;&#101;&#x40;&#x74;&#101;&#x73;&#x74;&#46;&#x63;&#111;&#x6d;">&#x66;&#97;&#107;&#x65;&#64;&#116;&#x65;&#x73;&#116;&#46;&#99;&#x6f;&#x6d;</a></p>
</body>
</html>

Both links look and work fine on the page, but only the 'manual' one is encoded.

I'm getting conflicting information from php.net on how htmlentities works.

http://php.net/manual/en/function.htmlentities.php

The documentation states that "all characters which have HTML character entity equivalents are translated into these entities." Since all letters in the alphabet HAVE equivalents, I expect every single char to be converted. But in the examples on that page, it demonstrates that basic letters do not get converted.

Further, when I view the source on that page, it does not appear that the php code has worked at all. My expectation is that both links appear the same in the code. Here is the results of 'view source'.

<html>
<head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /></head>
<body>
<p>PHP: <a href="mailto:fake@test.com">fake@test.com</a></p>
<p>&nbsp;</p>
<p>MANUAL: <a href="&#109;&#x61;&#105;&#108;&#116;&#x6f;&#58;&#102;&#x61;&#x6b;&#101;&#x40;&#x74;&#101;&#x73;&#x74;&#46;&#x63;&#111;&#x6d;">&#x66;&#97;&#107;&#x65;&#64;&#116;&#x65;&#x73;&#116;&#46;&#99;&#x6f;&#x6d;</a></p>
</body>
</html>

So it looks like htmlentities() isn't doing anything at all. Not even encoding the '@'.

Should I be adding some flags? Is there a better way to do this? If I am successful will this even work against the bots or am I wasting my time?

The misunderstanding may be from http://php.net/manual/en/function.htmlentities.php

This function is identical to htmlspecialchars() in all ways, except with htmlentities(), all characters which have HTML character entity equivalents are translated into these entities.

What it really means from http://php.net/manual/en/function.htmlspecialchars.php

Certain characters have special significance in HTML, and should be represented by HTML entities if they are to preserve their meanings.

htmlspecialchars() encodes: & , " , ' , < and > . Check:

print_r(get_html_translation_table(HTML_SPECIALCHARS));

htmlentities() encodes more characters, but only characters that have special significance in HTML . Check:

print_r(get_html_translation_table(HTML_ENTITIES));

You might look at something like this. I checked it in a link and it worked as expected:

$result = preg_replace_callback('/./', function($m) {
                                           return '&#'.ord($m[0]).';';
                                       },
                                       'mailto:fake@test.com');

This replaces each character in a string with &# then the ASCII value of the character and then ;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM