I am trying to use UTF-8 characters in CGI scripts.
I am using the following header for the CGI script:
#! /usr/bin/perl
#
use utf8;
use open ':std' => ':encoding(UTF-8)';
use CGI '-utf8';
my $q = CGI->new();
my %params = $q->Vars;
print $q->header( -type => "text/html", -charset => "UTF-8" );
print $q->start_html( -encoding => "UTF-8" );
The issue is that whenever I print something to standard output, I get output on the browser that looks like:
st\xE1n
instead of
stán
Any ideas what's wrong?
By using use CGI '-utf8';
, you indicate that inputs should be encoded using UTF-8.
utf8 "\\xE1" does not map to Unicode
means your input wasn't encoded using UTF-8.
The script doesn't output stán
because stán
wasn't provided to the script.
As @ikegami mentioned, your input does not look like UTF-8.
In general, to make your CGI output valid UTF-8, you should do two things:
Make sure your browser will understand that you're giving UTF-8 to it. You already did that.
Make sure the values of the variables you print are in UTF-8. This is the part that causes much problems. For example, if you get some value from the database, or from CGI parameter, or whatever, you should be sure it's stored internally as UTF-8 string. In most cases it means that you should explicitly run utf8::decode
on that scalar, eg if $stan
is the variable keeping the value you print, just put the following line before printing it:
utf8::decode($stan);
The use utf8;
directive in your source means that the script itself is in UTF-8. It means that you don't need to utf8::decode
the string constants explicitly as they are already UTF-8. But if your stán
is coming from some external source such as a database, you still need to utf8::decode
it.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.