简体   繁体   中英

How to handle character encoding in PHP - Codeigniter?

What is the best way to convert user input to UTF-8?

I have a simple form where a user will pass in HTML, the HTML can be in any language and it can be in any character encoding format.

My question is:

  • Is it possible to represent everything as UTF-8?

  • What can I use to effectively convert any character encoding to UTF-8 so that I can parse it with PHP string functions and save it to my database and subsequently echo out using htmlentities ?

I am trying to work out how to best implement this - advice and links appreciated.

I am making use of Codeigniter and its input class to retrieve post data.

A few points I should make:

  • I need to convert HTML special characters to their respective entities
  • It might be a good idea to accept encoding and return it in that same encoding. However, my web app is making use of :

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

This might have an adverse effect on things.

Specify accept-charset in your <form> tag to tell the browser to submit user-entered data encoded in UTF-8:

<form action="foo" accept-charset="UTF-8">...</form>

See here for a complete guide on HOW TO Use UTF-8 Throughout Your Web Stack .

Is it possible to represent everything as UTF-8?

Yes, UTF-8 is a Unicode encoding, so you can use any character defined in Unicode. That's the best you can do with a computer to date.

What can I use to effectively convert any character encoding to UTF-8

iconv lets you convert virtually any encoding to any other encoding. But , for that you have to know what encoding you're dealing with. You can't say " iconv , whatever this is, make it UTF-8!" . That's unfortunately not how it works. You can only say " iconv , I have this string here in BIG5, please convert that to UTF-8." .

If you're only dealing with form data in UTF-8 though, you'll probably never need to convert anything.

so that I can parse it with PHP string functions

"PHP string functions" work on bytes. They don't care about characters or encodings. Depending on what you want to do, working with naive PHP string functions on UTF-8 text will give you bad results. Use encoding-aware string functions in the MB extension for any multi-byte encoding string manipulation.

save it to my database

Just make sure your database stores text in UTF-8 and you have set your database connection to UTF-8 (ie the database knows you're sending it UTF-8 data). You should be able to specify that in the CodeIgniter database connection settings.

subsequently echo out using htmlentities?

Just echo htmlentities($text) , nothing more you need to do.

However, my web app is making use of : <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

This might have an adverse effect on things.

Not at all. It just signals to the browser that your page is encoded in UTF-8. Now you just need to make sure that's actually the case (as you're trying to do anyway). It also implies to the browser that it should send UTF-8 to the server. You can make that explicit with the accept-charset attribute on forms.

May I recommend What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text , which might help you understand more.

1) Is it possible to represent everything as UTF-8?

Yes, everything defined in UNICODE. That's the most you can get nowadays, and there is room for the future that UNICODE can support.

2) What can I use to effectively convert any character encoding to UTF-8 so that I can parse it with PHP string functions and save it to my database and subsequently echo out using htmlentities?

The only thing you need to know is the actual encoding of your data. If you want your webapplication to support UTF-8 for input and output, the frontend needs to signal that it supports UTF-8. See Character Encodings for a guide regarding your applications user-interface.

Within PHP you need to feed any function with the encoding it supports. Some need to have the encoding specified, for some you need to convert it. Always check the function docs if it supports what you ask for. Additionally check your PHP configuration.

Related:

  1. Preparing PHP application to use with UTF-8
  2. How to detect malformed utf-8 string in PHP?

I found out that the only thing that works out for UTF-8 encoding is setting inside my config.php

putenv('LC_ALL=en_US.utf8'); // or whatever language you need
setlocale(LC_ALL, 'en_US.utf8');  // or whatever language you need
bindtextdomain("mydomain", dirname(__FILE__) . "/../language");
textdomain("mydomain");

如果您想更改字符串的编码,可以尝试

$utf8_string = mb_convert_encoding( $yourBadString , 'UTF-8' );

EDIT :

Is it possible to represent everything as UTF-8?

Yes, these is what you need to ensure :

  • html : headers/meta-header set to utf-8
  • all files saved as utf-8
  • database collation, tables and data encoding to utf-8

What can I use to effectively convert any character encoding to UTF-8

You can use utf8_encode (Since for a system set up mainly for Western European languages, it will generally be ISO-8859-1 or its close relation, ref ) before saving it into your database.

// eg
$name = utf8_encode($this->input->post('name'));

And as i mention before, you need to make sure database collation, tables and data encoding to utf-8. In CI, at your database connection config

// Make sure have these lines
$db['default']['char_set'] = 'utf8';
$db['default']['dbcollat'] = 'utf8_general_ci';

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM