简体   繁体   中英

Having trouble saving strings to MySQL in UTF-8

i have a trouble with encoding a some file. In my php program i get a txt file. Using foreach and get each row from this file and do a table, next i try to put this data into my database, and after insert i dont have a polish letter in database. My database, table and all fields had a utf8_unicode_ci , and when i do a insert using phpmyadmin, all letter are normal. I try to use detect_encoding($row), and it detect ASCII decode. How can i insert a polish letter to my database? please, help.

my db connection:

try{
  $dbh = new PDO('mysql:dbname=google;host=localhost;','root','');  
  $dbh->setAttribute(PDO::ATTR_DEFAULT_FETCH_MODE, PDO::FETCH_ASSOC); 
  $dbh -> query ('SET NAMES utf8');
  $dbh -> query ('SET CHARACTER_SET utf8_unicode_ci');}

i try

$url = mb_convert_encoding($url,'UTF-8',mb_detect_encoding($url));

and

$url = Encoding::toUTF8($url);

and ofc iconv Any other ideas ?

In my insert all is ok, its a example:

PDOStatement Object ( [queryString] => insert into `site` values ("","meblegdańsk.pl","1") ) 

Your mysql query is wrong. It's SET CHARACTER SET utf8_unicode_ci (notice the space instead of the underscore between SET and CHARACTER . This may be the most likely cause for your problems.

On some weird MySQL configurations, you may need to set other character encoding related things (but you usually don't so do not mess around with these unnecessarily): http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html

By the way this won't work on its own (unless you've used mb_detect_order first):

$url = mb_convert_encoding($url,'UTF-8',mb_detect_encoding($url));

If you want to convert latin2 characters to utf-8, but leave them alone if they are already utf-8, you should do:

$url = mb_convert_encoding($url, 'UTF-8',array ('UTF-8', 'ISO-8859-2'));
// or
mb_detect_order(array ('UTF-8', 'ISO-8859-2'));
$url = mb_convert_encoding($url,'UTF-8',mb_detect_encoding($url));

Apologies if you've already used mb_detect_order() like this previously. A note to everyone else: ISO-8859-2 should be replaced with whichever other encoding you would expect to find, probably depending on what languages you speak in. In most western European countries, ISO-8859-1 is the typically used 1-byte character encoding.

Anyway, this allows for checking if the given string is valid UTF-8 (and therefore doesn't change it), or if it isn't valid UTF-8, it assumes it's ISO-8859-2 and converts it. The order matters as every string is valid ISO-8859-2 and you would never be able to "fall back" to UTF-8. I was also assuming then when you said ASCII, you meant ISO-8859-2 (they are not the same thing).

Also, to make my answer complete, I also want to remind you that you should make sure each column in your tables are also set to use utf8 encodings.

I suspect that mb_detect_encoding() doesn't work the way you think:

string mb_detect_encoding ( string $str [, mixed $encoding_list = mb_detect_order() [, bool $strict = false ]] )

If you omit the second argument you're often choosing between two encodings :

Array
(
    [0] => ASCII
    [1] => UTF-8
)

In the end, you're asking whether your Polish text is ASCII or UTF-8 and converting the result to UTF-8. Problems with that:

  • ASCII is a subset of UTF-8. Converting from ASCII to UTF-8 will never alter your input data.
  • ASCII cannot encode Polish characters.

It's hard to say why you obtain ASCII as output without sample data—PHP is probably defaulting to ASCII if the text is clearly not UTF-8, but having the strict encoding detection flag set to false can't help.

I suggest you rethink why you need to detect encoding in the first place. If the application doesn't require input files to be in certain encoding and there's no way to change that requisite, I suggest you compile a list of typical encodings in Polish texts and feed mb_detect_encoding() with it.

BTW, the recommended way to set the connection encoding is the charset parameter in the DSN:

$dbh = new PDO('mysql:dbname=google;host=localhost;charset=utf8','root','');  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM