简体   繁体   English

将Perl正则表达式转换为.NET

[英]Translate Perl regular expressions to .NET

I have some useful regular expressions in Perl. 我在Perl中有一些有用的正则表达式 Is there a simple way to translate them to .NET's dialect of regular expressions? 是否有一种简单的方法可以将它们转换为.NET的正则表达式方言?

If not, is there a concise reference of differences? 如果没有,是否有简要的差异参考?

There is a big comparison table in http://www.regular-expressions.info/refflavors.html . http://www.regular-expressions.info/refflavors.html中有一个很大的比较表。


Most of the basic elements are the same, the differences are: 大多数基本元素相同,区别在于:

Minor differences: 细微差异:

  • Unicode escape sequences. Unicode转义序列。 In .NET it is \  , in Perl it is \\x{200A} . 在.NET中,它是\  ,在Perl中是\\x{200A}
  • \\v in .NET is just the vertical tab (U+000B), in Perl it stands for the "vertical whitespace" class. .NET中的\\v只是垂直制表符(U + 000B),在Perl中代表“垂直空格”类。 Of course there is \\V in Perl because of this. 因此,Perl中有\\V
  • The conditional expression for named reference in .NET is (?(name)yes|no) , but (?(<name>)yes|no) in Perl. .NET中命名引用的条件表达式为(?(name)yes|no) ,但在Perl中为(?(<name>)yes|no)

Some elements are Perl-only: 一些元素仅适用于Perl:

  • Possessive quantifiers ( x?+ , x*+ , x++ etc). 所有格量词( x?+x*+x++等)。 Use non-backtracking subexpression ( (?>…) ) instead. 改用非回溯子表达式( (?>…) )。
  • Named unicode escape sequence \\N{LATIN SMALL LETTER X} , \\N{U+200A} . 命名为unicode转义序列\\N{LATIN SMALL LETTER X}\\N{U+200A}
  • Case folding and escaping 案件折叠和逃脱
    • \\l (lower case next char), \\u\u003c/code> (upper case next char). \\l (小写的下一个字符), \\u\u003c/code> (大写的下一个字符)。
    • \\L (lower case), \\U (upper case), \\Q (quote meta characters) until \\E . \\L (小写), \\U (大写), \\Q (引用元字符),直到\\E为止。
  • Shorthand notation for Unicode property \\pL and \\PL . Unicode属性\\pL\\PL简写形式。 You have to include the braces in .NET eg \\p{L} . 您必须在.NET中包含括号,例如\\p{L}
  • Odd things like \\X , \\C . 诸如\\X\\C类的奇怪东西。
  • Special character classes like \\v , \\V , \\h , \\H , \\N , \\R 特殊字符类,例如\\v\\V\\h\\H\\N\\R
  • Backreference to a specific or previous group \\g1 , \\g{-1} . 向后引用特定的或先前的\\g1\\g{-1} You can only use absolute group index in .NET. 您只能在.NET中使用绝对组索引。
  • Named backreference \\g{name} . 命名为反向引用\\g{name} Use \\k<name> instead. 请改用\\k<name>
  • POSIX character class [[:alpha:]] . POSIX字符类[[:alpha:]]
  • Branch-reset pattern (?|…) 分支重置模式(?|…)
  • \\K . \\K Use look-behind ( (?<=…) ) instead. 使用后向( (?<=…) )代替。
  • Code evaluation assertion (?{…}) , post-poned subexpression (??{…}) . 代码评估断言(?{…}) ,延迟子表达式(??{…})
  • Subexpression reference (recursive pattern) (?0) , (?R) , (?1) , (?-1) , (?+1) , (?&name) . 子表达式参考(递归模式) (?0)(?R)(?1)(?-1)(?+1)(?&name)
  • Some conditional expression's predicate are Perl-specific: 一些条件表达式的谓词是Perl特定的:
    • code (?{…}) 代码(?{…})
    • recursive (R) , (R1) , (R&name) 递归(R)(R1)(R&name)
    • define (DEFINE) . 定义(DEFINE)
  • Special Backtracking Control Verbs (*VERB:ARG) 特殊的回溯控制动词(*VERB:ARG)
  • Python syntax Python语法
    • (?P<name>…) . (?P<name>…) Use (?<name>…) instead. 使用(?<name>…)代替。
    • (?P=name) . (?P=name) Use \\k<name> instead. 请改用\\k<name>
    • (?P>name) . (?P>name) No equivalent in .NET. .NET中没有等效项。

Some elements are .NET only: 某些元素仅是.NET:

  • Variable length look-behind. 可变长度后向。 In Perl, for positive look-behind, use \\K instead. 在Perl中,要获得积极的后视效果,请改用\\K
  • Arbitrary regular expression in conditional expression (?(pattern)yes|no) . 条件表达式中的任意正则表达式(?(pattern)yes|no)
  • Character class subtraction (undocumented?) [az-[dw]] 字符类减法(未记录?) [az-[dw]]
  • Balancing Group (?<-name>…) . 平衡组(?<-name>…) This could be simulated with code evaluation assertion (?{…}) followed by a (?&name) . 这可以用代码评估断言(?{…})后面跟(?&name)来模拟。

References: 参考文献:

They were designed to be compatible with Perl 5 regexes. 它们被设计为与Perl 5正则表达式兼容 As such, Perl 5 regexes should just work in .NET. 因此,Perl 5正则表达式应该只在.NET中工作。

You can translate some RegexOptions as follows: 您可以按以下方式翻译一些RegexOptions

[Flags]
public enum RegexOptions
{
  Compiled = 8,
  CultureInvariant = 0x200,
  ECMAScript = 0x100,
  ExplicitCapture = 4,
  IgnoreCase = 1,                 // i in Perl
  IgnorePatternWhitespace = 0x20, // x in Perl
  Multiline = 2,                  // m in Perl
  None = 0,
  RightToLeft = 0x40,
  Singleline = 0x10               // s in Perl
}

Another tip is to use verbatim strings so that you don't need to escape all those escape characters in C#: 另一个技巧是使用逐字字符串,这样您就无需在C#中转义所有这些转义字符:

string badOnTheEyesRx    = "\\d{4}/\\d{2}/\\d{2}";
string easierOnTheEyesRx = @"\d{4}/\d{2}/\d{2}";

It really depends on the complexity of the regular expression - many ones will work the same out of the box. 它实际上取决于正则表达式的复杂性-许多表达式可以立即使用。

Take a look at this .NET regex cheat sheet to see if an operator does what you expect it to do. 看看 .NET regex备忘单,看看操作员是否按照您的期望去做。

I don't know of any tool that automatically translates between RegEx dialects. 我不知道能在RegEx方言之间自动翻译的任何工具。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM