简体   繁体   中英

CodeIgniter UTF-8 encoding issue

The Issue

I've been having some trouble with what I think is a UTF-8 encoding issue where posts are not being saved to my database.

The issue occurs when a user copy and pastes text from MS Word. There seems to be a particular combination of characters causing this issue (I've not found any other variations which cause the same issue yet):

  • % b
  • % B

This means that, when I var_dump() my input I get:

string(5) "70 ck"

Instead of:

string(5) "70% back"

Edit: The database error I get is:

Incorrect string value: '\\xBAck an...' for column [...]


What I've tried

I'm using the Summernote JS plugin. I've tried a different plugin (WYSIHTML5) and I've tried with no plugin at all. I've tried pasting the clipboard text as plain text. I've even got an onPaste callback on the summernote which strips all the stupid encoding/styling from MS Word (which is summernote specific issue I think).

Unfortunately I've not been able to get anywhere with searching 'encoding issue "% b"' and variations thereof... but I would presume that the combination of characters above is somehow getting translated into a character that is unsupported by the database...

  • Database is MySQL 5.7.10 and I'm using utf8_general_ci collation on all columns.
  • I've set the charset to UTF-8 within CodeIgniter: $config['charset'] = 'UTF-8';
  • Within CodeIgniter's database config I've specified 'char_set' => 'uft8', 'dbcollat' => 'utf8_general_ci'
  • The page's meta tag is set to use utf-8: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  • The form has the accept-charset="utf-8" attribute

Update: I've also tried the solution suggested in this question


I think I've done all the usual troubleshooting and I'm a bit stuck. Does anyone know why this specific combination of characters causes issue? Perhaps I'm wrong and it's not an encoding issue at all? Does anyone have any other ideas?

You should look into doing more on the front-end side. Try setting the encoding on the form, as most browsers should then only send UTF-8 to your server

<form ... accept-charset="UTF-8">
   ...
</form>

See this answer for more detail

Also, if you are using an editor, check out Quill , which allows pasting from word.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM