简体   繁体   English

正则表达式替换

[英]Regular expression replace

I need a Reg Ex script 我需要一个Reg Ex脚本

  • remove all symbols 删除所有符号
  • allow max 1 hyphen connected to each other 允许最多连接1个连字符
  • allow max 1 period total 总共最多允许1个期间

example: 例:

  • Mike&Ike output is: MikeIke Mike&Ike的输出是:MikeIke
  • Mike-Ike output is: Mike-Ike Mike-Ike的输出是:Mike-Ike
  • Mike-Ike-Jill output is: Mike-Ike-Jill Mike-Ike-Jill的输出是:Mike-Ike-Jill
  • Mike--Ike-Jill output is: Mike-Ike-Jill Mike-Ike-Jill的输出是:Mike-Ike-Jill
  • Mike--Ike---Jill output is: Mike-Ike-Jill Mike--Ike --- Jill的输出是:Mike-Ike-Jill
  • Mike.Ike.Bill output is: Mike.IkeBill Mike.Ike.Bill的输出是:Mike.IkeBill
  • Mike***Joe output is: MikeJoe Mike *** Joe 输出为:MikeJoe
  • Mike123 output is: Mike123 Mike123的输出是:Mike123
#!/usr/bin/env perl

use 5.10.0;
use strict;
use warnings;

my @samples = (
    "Mike&Ike"          => "MikeIke",
    "Mike-Ike"          => "Mike-Ike",
    "Mike-Ike-Jill"     => "Mike-Ike-Jill",
    "Mike--Ike-Jill"    => "Mike-Ike-Jill",
    "Mike--Ike---Jill"  => "Mike-Ike-Jill",
    "Mike.Ike.Bill"     => "Mike.IkeBill",
    "Mike***Joe"        => "MikeJoe",
    "Mike123"           => "Mike123",
);

while (my($got, $want) = splice(@samples, 0, 2)) {
    my $had = $got;
    for ($got) {
  # 1) Allow max 1 dashy bit connected to each other.
        s/ ( \p{Dash} ) \p{Dash}+                           /$1/xg;
  # 2) Allow max 1 period, total.
        1 while s/ ^ [^.]* \. [^.]* \K \.                   //x   ;
  # 3) Remove all symbols...
        s/ (?! [\p{Dash}.] ) [\p{Symbol}\p{Punctuation}]    //xg  ;
  #                   ...and punctuation
  #       except for dashy bits and dots.
    }

    if ($got eq $want) { print "RIGHT" }
    else               { print "WRONG" }
    print ":\thad\t<$had>\n\twanted\t<$want>\n\tgot\t<$got>\n";
}

Generates: 产生:

RIGHT:  had <Mike&Ike>
    wanted  <MikeIke>
    got <MikeIke>
RIGHT:  had <Mike-Ike>
    wanted  <Mike-Ike>
    got <Mike-Ike>
RIGHT:  had <Mike-Ike-Jill>
    wanted  <Mike-Ike-Jill>
    got <Mike-Ike-Jill>
RIGHT:  had <Mike--Ike-Jill>
    wanted  <Mike-Ike-Jill>
    got <Mike-Ike-Jill>
RIGHT:  had <Mike--Ike---Jill>
    wanted  <Mike-Ike-Jill>
    got <Mike-Ike-Jill>
RIGHT:  had <Mike.Ike.Bill>
    wanted  <Mike.IkeBill>
    got <Mike.IkeBill>
RIGHT:  had <Mike***Joe>
    wanted  <MikeJoe>
    got <MikeJoe>
RIGHT:  had <Mike123>
    wanted  <Mike123>
    got <Mike123>

you could do something with several passes. 您可以通过几遍来做点什么。
it's kind of generic workaround that could be shorted by using lookbehind. 这是一种通用的解决方法,可以通过使用lookbehind来缩短。
(not all regex flavors do support this) (并非所有的正则表达式都支持此功能)

  1. remove multiple - with regex -{2,} 移除多个-与正则表达式-{2,}
  2. remove symbols except -. 除去-.以外的其他符号-. with regex [^-\\.A-Za-z0-9] 与正则表达式[^-\\.A-Za-z0-9]
  3. replace first . 首先替换. with a temp character eg ! 带有临时字符,例如! and replace remaining . 并替换剩余的.
  4. replace the ! 更换! from last step with . 从最后一步开始.

update using C# .net 使用C#.net 更新
(I'm not a C# programmer, used this regex tester and this reference for C# .net regex flavor.) (我不是C#程序员,使用此regex测试器和C#.net regex风格的此参考 。)

String str = "Mike&Ike ......";
str = Regex.Replace( str, @"-+", @"-" );
str = Regex.Replace( str, @"(?<=\.)(.*?)\.", @"$1" );
str = Regex.Replace( str, @"[^\w\r\n]", @"" );
  1. replacing multipe - with single - 更换multipe --
  2. remove . 删除. if it's not the first . 如果不是第一个. using positiv lookbehind (?<=...) 使用positiv lookbehind (?<=...)
  3. remove symbols (actually everything not a word character or newline) \\w is short for [A-Za-z0-9] 删除符号(实际上不是单词或换行符) \\w[A-Za-z0-9]缩写

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM