简体   繁体   English

使用 perl 脚本从字符串中删除特殊字符

[英]Removal of special characters from string using perl script

I have a string like below我有一个如下的字符串

stringinput = Sweééééôden@

I want to get output like我想得到 output 之类的

stringoutput = Sweden

the spl characters ééééô and @ has to be removed.必须删除 spl 字符ééééô@

Am using我正在使用

$stringoutput = `echo $stringinput | sed 's/[^a-z  A-Z 0-9]//g'`;

I am getting result like Sweééééôden but ééééô is not getting removed.我得到了像Sweééééôden这样的结果,但ééééô没有被删除。

Can you please suggest what I have to add你能建议我补充什么吗

No need to call sed from Perl, perl can do the substitution itself.无需从 Perl 调用 sed,perl 可以自己进行替换。 It's also faster, as you don't need to start a new process.它也更快,因为您无需启动新流程。

#!/usr/bin/perl
use warnings;
use strict;
use utf8;

my $string = 'Sweééééôden@';
$string =~ s/[^A-Za-z0-9]//g;
print $string;

You need to use LC_ALL=C before sed command to make [A-Za-z] character class create ranges as per ASCII table:您需要在sed命令之前使用LC_ALL=C以使[A-Za-z]字符 class 根据 ASCII 表创建范围:

stringoutput=$(echo $stringinput | LC_ALL=C sed 's/[^A-Za-z0-9]//g')

See the online demo :查看在线演示

stringinput='Sweééééôden@';
stringoutput=$(echo $stringinput | LC_ALL=C sed 's/[^A-Za-z0-9]//g');
echo "$stringoutput";
# => Sweden

See POSIX regex reference :请参阅POSIX 正则表达式参考

In the default C locale, the sorting sequence is the native character order;在默认的 C 语言环境中,排序顺序是本机字符顺序; for example, '[ad]' is equivalent to '[abcd]'.例如,“[ad]”等价于“[abcd]”。 In other locales, the sorting sequence is not specified, and '[ad]' might be equivalent to '[abcd]' or to '[aBbCcDd]', or it might fail to match any character, or the set of characters that it matches might even be erratic.在其他语言环境中,未指定排序顺序,并且“[ad]”可能等同于“[abcd]”或“[aBbCcDd]”,或者它可能无法匹配任何字符或它所匹配的字符集匹配甚至可能是不稳定的。 To obtain the traditional interpretation of bracket expressions, you can use the 'C' locale by setting the LC_ALL environment variable to the value 'C'.要获得括号表达式的传统解释,您可以通过将 LC_ALL 环境变量设置为值“C”来使用“C”语言环境。

In Perl, you could simply use在 Perl 中,您可以简单地使用

my $stringinput = 'Sweééééôden@';
my $stringoutput = $stringinput =~ s/[^A-Za-z0-9]+//gr;
print $stringoutput;

See this online demo .请参阅此在线演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM