Perl not printing the special characters

Question

My scrape content is not displaying the special characters.It shows some junk values in place of special characters.(€ printed as -aA).Thanks in advance.

#  !/usr/bin/perl 
use strict;
use warnings;

use HTML::TreeBuilder::XPath;
use LWP::UserAgent;

my $ua = LWP::UserAgent->new(agent => "Mozilla/5.0");
my $req = HTTP::Request->new(GET => 'http://www.infanziabimbo.it/costi-modalita-e-tempi-di-spedizione.html');
my $res = $ua->request($req);

die("error") unless $res->is_success;

my $xp = HTML::TreeBuilder::XPath->new_from_content($res->content);
my @node =  $xp->findnodes_as_strings('//div[@class="mainbox-body"]');
die("node doesn't exist") if $#node == -1; # Line 18
open HTML, ">C:/Users/jeyakuma/Desktop/kjk.html";
foreach(<@node>)
{

print HTML "$_";


}
close HTML;


"

Answer 1

Here are some observations on your code that I hope will help you

You must always check that a call to open succeeded, otherwise your program will just continue to run silently without any input or output. Rather than the idiomatic open ... or die $! you may prefer just to add use autodie at the top of your code
If the HTTP request fails, it is more informative if your program indicates why it failed instead of just saying "error" . I suggest you write this instead
```
 $res->is_success or die $res->status_line; 
```
If you don't need any special LWP or parse options, then you can just write
```
 my $url = 'http://www.infanziabimbo.it/costi-modalita-e-tempi-di-spedizione.html'; my $xp = HTML::TreeBuilder::XPath->new_from_url($url); 
```
although that doesn't give you any way to specify the user agent string as you do currently
Rather than testing $#node for equality to -1, it is much neater to check for the truth of @node , so
```
 die "node doesn't exist" unless @node; # Line 18 
```
If your data contains UTF-8 characters then your output file handle must be set to the appropriate mode. You can change the mode using binmode , like this
```
 open HTML, ">C:/Users/jeyakuma/Desktop/kjk.html"; binmode HTML, ':encoding(utf-8)'; 
```
But the best way is to use the preferred three-parameter form of open , which would look like this, assuming that you have use autodie in place at the start of your program
```
 open HTML, '>:encoding(utf-8)', 'C:/Users/jeyakuma/Desktop/kjk.html'; 
```
Lexical file handles are far superior to the old-fashioned global file handles
The loop foreach(<@node>) { ... } is completely wrong because it is equivalent to foreach (glob join ' ', @node) { ... } and only appears to work because, in general, glob will leave a filename untouched if it doesn't contain any wildcards. What you meant was just for (@node) { ... }
In addition, it is bad practice to enclose a variable in quotes unless you specifically want to call its stringification method, so "$_" should be just $_
You may as well write your final output loop as
```
 print HTML @node; 
```

Putting these changes in place, the result looks like this, which I believe will fix your problem

use strict;
use warnings;
use autodie;

use HTML::TreeBuilder::XPath;

my $url = 'http://www.infanziabimbo.it/costi-modalita-e-tempi-di-spedizione.html';
my $xp  = HTML::TreeBuilder::XPath->new_from_url($url);

my @node = $xp->findnodes_as_strings('//div[@class="mainbox-body"]');
die "node doesn't exist" unless @node;

open my $html_fh, '>:encoding(utf-8)', 'C:/Users/jeyakuma/Desktop/kjk.html';
print $html_fh @node;
close $html_fh;

Perl not printing the special characters

Question

1 answers

solution1
1 ACCPTED 2014-06-23 17:22:18

Perl not printing the special characters

Question

1 answers

solution1 1 ACCPTED 2014-06-23 17:22:18

solution1
1 ACCPTED 2014-06-23 17:22:18