简体   繁体   中英

How can I do special characters conversion in php and sql?

I am learning curl to fetch data from a site. Everything works fine with Curl except for special characters. When I look at the source of the site it has following items.

<li class="page_item page-item"><a href="../categories/mens-health/">Men&#8217;s Health</a></li>
<li class="page_item page-item"><a href="../categories/nails-hair-skin/">Nails, Hair &#038; Skin</a></li>
<li class="page_item page-item"><a href="../categories/womens-health/">Women’s Health</a></li>  

When I get the data in array and echo it on browser I get the result as

Men&#8217;s Health  
Nails, Hair &#038; Skin  
Women’s Health

which I got by executing the following code

$search = array('&#146;');
$replace = array("'");  
$category_names[] = htmlentities(str_replace($search, $replace, $word), ENT_QUOTES);

$word being the 3 array items above. Now I am not able to convert them to proper characters while inserting into database. This is how it appears in my db

Men&amp;#8217;s Health
Nails, Hair &amp;#038; Skin
Women&rsquo;s Health

How can I insert it in proper format as follows?
Men's health
Nails. Hair & Skin
Women's Health

I checked some of the solutions for having apostrophe but they are mostly single insert statements, where as I am inserting in a loop.

Way to insert text having ' (apostrophe) into a SQL table
How do I escape a single quote in SQL Server?

I did html_entity_decode($category_names[$i]); and now I get the following reult in my database
Men’s Health
Nails, Hair & Skin
Women’s Health

html_entity_decode will decode HTML entities, including NCR s. For example, &#8217; will become ' .

<?php
$in = 'Men&#8217;s Health  
Nails, Hair &#038; Skin  
Women’s Health';

echo html_entity_decode($in);

will print

Men’s Health  
Nails, Hair & Skin  
Women’s Health

The code above is hosted here: http://ideone.com/1rWL45

EDIT

Your DB table might be in Latin1 and inserting Unicode (eg. ' ) characters into it will result in such mangled characters. Simply replacing a few Unicode characters to ASCII may mitigate certain part of your encoding problem. However, I recommend altering table's character set to UTF-8.

<?php

$map = [ '’' => "'", "..." => "..." ]; // from->to pairs
$normalized = str_replace(array_keys($map), array_values($map), $string);

may be .html and .text function can help you for example:

html

<div id="test">&lt;&lt;</div>

jquery

var t = $('#test');
t.html(t.text());

may be this can help you js fiddle link

Certain characters have special significance in HTML, and should be represented by HTML entities if they are to preserve their meanings. This function returns a string with some of these conversions made; the translations made are those most useful for everyday web programming. If you require all HTML character entities to be translated, use htmlentities() instead.

htmlspecialchars — Convert special characters to HTML entities

string htmlspecialchars ( string $string [, int $flags = ENT_COMPAT | ENT_HTML401 [, string $encoding = ini_get("default_charset") [, bool $double_encode = true ]]] )

If the input string passed to this function and the final document share the same character set, this function is sufficient to prepare input for inclusion in most contexts of an HTML document. If, however, the input can represent characters that are not coded in the final document character set and you wish to retain those characters (as numeric or named entities), both this function and htmlentities() (which only encodes substrings that have named entity equivalents) may be insufficient. You may have to use mb_encode_numericentity() instead.

The translations performed are:

'&' (ampersand) becomes '&amp;'
'"' (double quote) becomes '&quot;' when ENT_NOQUOTES is not set.
"'" (single quote) becomes '&#039;' (or &apos;) only when ENT_QUOTES is set.
'<' (less than) becomes '&lt;'
'>' (greater than) becomes '&gt;'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM