简体   繁体   English

Perl:将(高)十进制NCR转换为UTF-8

[英]Perl: Convert (high) decimal NCR to UTF-8

I have this string (Decimal NCRs): 日本の鍼灸とは 我有这个字符串(十进制NCR): 日本の鍼灸とは

It represents the Japanese text 日本の鍼灸とは . 它代表日本文本日本の鍼灸とは

But I need (UTF-8): %E6%97%A5%E6%9C%AC%E3%81%AE%E9%8D%BC%E7%81%B8%E3%81%A8%E3%81%AF 但我需要(UTF-8): %E6%97%A5%E6%9C%AC%E3%81%AE%E9%8D%BC%E7%81%B8%E3%81%A8%E3%81%AF

For the first character: 日 对于第一个角色: 日 %E6%97%A5 %E6%97%A5

This site does it, but how do I get this in Perl? 这个网站做到了,但我如何在Perl中获得这个? (If possible in a single regex like s/\\&\\#([0-9]+);/uc('%'.unpack("H2", pack("c", $1)))/eg; .) (如果可能在单个正则表达式中,如s/\\&\\#([0-9]+);/uc('%'.unpack("H2", pack("c", $1)))/eg;

http://www.endmemo.com/unicode/unicodeconverter.php http://www.endmemo.com/unicode/unicodeconverter.php

Also I need to convert it back again from UTF-8 to Decimal NCRs 此外,我需要将其从UTF-8再次转换回十进制NCR

I've been breaking my head over this one for half a day now, any help is greatly appreciated! 我现在已经半天打破了这一天,任何帮助都非常感谢!

What you call "UTF-8" is actually URL-encoding. 您所谓的“UTF-8”实际上是URL编码。


HTML entities ( 日 ) ⇒ text ( ) ⇒ URI component ( %E6%97%A5 ): HTML实体( 日 )⇒文本( 日组件( %E6%97%A5 ):

use HTML::Entities qw( decode_entities );
use URI::Escape    qw( uri_escape_utf8 );

my $text = decode_entities($html);
my $uri_component = uri_escape_utf8($text);

URI component ( %E6%97%A5 ) ⇒ text ( ) ⇒ HTML entities ( 日 ): URI组件( %E6%97%A5 )⇒文本( 日实体( 日 ):

use Encode         qw( decode_utf8 );
use HTML::Entities qw( encode_entities );
use URI::Escape    qw( uri_unescape );

my $text = decode_utf8(uri_unescape($uri_component));
my $html = encode_entities($text);
#!/usr/bin/perl
use strict;
use warnings;

use Test::More tests => 2;
use Encode qw{ encode decode };

my $in = '日本の鍼灸とは'; # 日本の鍼灸とは
my $out = '%E6%97%A5%E6%9C%AC%E3%81%AE%E9%8D%BC%E7%81%B8%E3%81%A8%E3%81%AF';

(my $utf = $in) =~ s/&#(.*?);/chr $1/ge;

my $r = join q(), map { sprintf '%%%2X', ord } split //, encode('utf8', $utf);
is($r, $out);

(my $s = $r) =~ s/%(..)/chr hex $1/ge;
$s = decode('utf8', $s);
$s = join q(), map '&#' . ord . ';', split //, $s;
is($s, $in);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM