Im working on an imdb data scraper for a site, and I they seem to encode everything in a weird encoding I never saw before.
<a href="/keyword/exploding-ship/">Exploding Ship</a>
A Bug's Life
Is there a php function that will convert these to regular characters?
This is not encoding, it's html entities hexadecimal codes.
try
$converted = html_entity_decode($string, ENT_QUOTES, 'UTF-8');
Those are SGML character escapes. They can be either decimal ( '
) or hexadecimal (  
) and refer directly to a Unicode code point.
html_entity_decode() should work in PHP 5. Though I can't test at the moment.
In the first comment on that reference page, the following code is given for older PHP versions:
// For users prior to PHP 4.3.0 you may do this:
function unhtmlentities($string)
{
// replace numeric entities
$string = preg_replace('~&#x([0-9a-f]+);~ei', 'chr(hexdec("\\1"))', $string);
$string = preg_replace('~&#([0-9]+);~e', 'chr("\\1")', $string);
// replace literal entities
$trans_tbl = get_html_translation_table(HTML_ENTITIES);
$trans_tbl = array_flip($trans_tbl);
return strtr($string, $trans_tbl);
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.