简体   繁体   中英

Regex on <STDIN> not finding matches - Perl

I'm trying to make a script that would take in a string with accented characters, and return their unaccented counterparts.

I managed to make something that almost works after looking around for some help and tutorials, but I have a problem.

My code does what I want, as long as I want it done on a simple string, but it does absolutely nothing when I want to use <STDIN> .

My code:

use strict;
use warnings;


my %replace = (
    'é' => "e",
    'á' => "a",
    'ő' => "o",
    'ö' => "o",
    'ó' => "o",
    'ú' => "u",
    'ü' => "u",
    'ű' => "u",
    'í' => "i",    
);

my $regex = join "|", keys %replace;
$regex = qr/$regex/;

my $s = <STDIN>;
$s = substr $s, 0, length($s) - 1;

my $var = "$s - öüóőúéáű";

$var =~ s/($regex)/$replace{$1}/g;

$s = $var;

print "$s\n";

If i input "öüóőúéáű" to <STDIN> i get the following output:

öüóőúéáű - ouooueau

Could someone tell me what I'm doing wrong?

EDIT:

I checked, and when using it like the following (with <DATA> instead of <STDIN> ) it works properly:

use strict;
use warnings;

my %replace = (
    'é' => "e",
    'á' => "a",
    'ő' => "o",
    'ö' => "o",
    'ó' => "o",
    'ú' => "u",
    'ü' => "u",
    'ű' => "u",
    'í' => "i",    
);

my $regex = join "|", keys %replace;
$regex = qr/$regex/;

my $s = <DATA>;
$s = substr $s, 0, length($s) - 1;

my $var = "$s - öüóőúéáű";

$var =~ s/($regex)/$replace{$1}/g;

$s = $var;

print "$s\n";

__DATA__
öüóőúéáű

EDIT2:

I now did the following: my $s = <DATA>." - ".<>; so it reads in the characters from <DATA> as well as from <STDIN> and now I realized, that it still matches with <DATA> and does noting to <STDIN> , so i get the following output:

uaeuoouoi - űáéúőóüöí - uaeuoouoi from the following code:

use strict;
use warnings;
use utf8;

my %replace = (
    'é' => "e",
    'á' => "a",
    'ő' => "o",
    'ö' => "o",
    'ó' => "o",
    'ú' => "u",
    'ü' => "u",
    'ű' => "u",
    'í' => "i",    
);

my $regex = join "|", keys %replace;
$regex = qr/$regex/;



my $s = <DATA>." - ".<>;
$s = substr $s, 0, length($s) - 1;

my $var = "$s - űáéúőóüöí";

$var =~ s/($regex)/$replace{$1}/g;

$s = $var;

print "$s\n";

__DATA__
űáéúőóüöí

where <STDIN> = űáéúőóüöí

In my case with your program I get the expected result:

use strict;
use warnings;


my %replace = (
    'é' => "e",
    'á' => "a",
    'ő' => "o",
    'ö' => "o",
    'ó' => "o",
    'ú' => "u",
    'ü' => "u",
    'ű' => "u",
    'í' => "i",    
);

my $regex = join "|", keys %replace;
$regex = qr/$regex/;

my $s = <DATA>;
$s = substr $s, 0, length($s) - 1;

my $var = "$s - öüóőúéáű";

$var =~ s/($regex)/$replace{$1}/g;

$s = $var;

print "$s\n";

__DATA__
öüóőúéáű

Where I get:

$ perl test.pl
ouooueau - ouooueau

So you have another problem such as an encoding issue.

You can try to add to your program.

use utf8;

Also you can simplify your program like this:

use strict;
use warnings;

my %replace = (
    'é' => "e",
    'á' => "a",
    'ő' => "o",
    'ö' => "o",
    'ó' => "o",
    'ú' => "u",
    'ü' => "u",
    'ű' => "u",
    'í' => "i",    
);

while(<DATA>) {
    for my $key (keys %replace) {
        s/$key/$replace{$key}/;
    }
    print;
}

__DATA__
öüóőúéáű

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM