简体   繁体   中英

Unknown UTF-8 character in PERL

I want to read string from text file in perl

The method I used to read is :

my $indPara = "C:\\Users\\Admin001\\Desktop\\paraText.txt";
open(INDPARA, $indPara) || die "Indesign paraText not found on location!";
my $indesignPara = <INDPARA>;
close INDPARA;

When reading the text, I am getting an unknown unicode character ( &#65279 or &#xFEFF ) at the starting of text,

Download the text file that I used to read from below link

https://mega.co.nz/#!r1pAyAhA!VSnL2tbPWoTtThcCRoiogSxK4ok_0YvZSczs054w0uU

I am using Komodo IDE 8.5 and perl 5.16.3

kindly give some idea to overcome this

Thanks in advance

Vimal

What you have there is a BOM . It is telling you that what you have is not a UTF-8 file, it is a UTF-16 (BE) file).

The BOM is not part of the data in the file, so you can safely just skip past it and continue to read the data beyond it. However, you should not treat the data that you are reading from the file as UTF-8, you should treat it as UTF-16 (BE) and decode it appropriately.

If you would have the entire string ( $indesignPara ), do:

$s = decode("UTF-16LE", $s, Encode::FB_QUIET);

but I am not sure <INDPARA> works though. You could try "<:encoding(UTF-16LE)" as first extra parameter to open. And then strip the first wide character, the BOM U+FFFE.

Thank you So much guys for your kind help and ideas I found a way to clear this, ie: just find and replace this s/\\x{feff}//g; and it works !

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM