[英]Translate Perl regular expressions to .NET
I have some useful regular expressions in Perl. 我在Perl中有一些有用的正则表达式 。 Is there a simple way to translate them to .NET's dialect of regular expressions?
是否有一种简单的方法可以将它们转换为.NET的正则表达式方言?
If not, is there a concise reference of differences? 如果没有,是否有简要的差异参考?
There is a big comparison table in http://www.regular-expressions.info/refflavors.html . http://www.regular-expressions.info/refflavors.html中有一个很大的比较表。
Most of the basic elements are the same, the differences are: 大多数基本元素相同,区别在于:
Minor differences: 细微差异:
\
, in Perl it is \\x{200A}
. \
,在Perl中是\\x{200A}
。 \\v
in .NET is just the vertical tab (U+000B), in Perl it stands for the "vertical whitespace" class. \\v
只是垂直制表符(U + 000B),在Perl中代表“垂直空格”类。 Of course there is \\V
in Perl because of this. \\V
(?(name)yes|no)
, but (?(<name>)yes|no)
in Perl. (?(name)yes|no)
,但在Perl中为(?(<name>)yes|no)
。 Some elements are Perl-only: 一些元素仅适用于Perl:
x?+
, x*+
, x++
etc). x?+
, x*+
, x++
等)。 Use non-backtracking subexpression ( (?>…)
) instead. (?>…)
)。 \\N{LATIN SMALL LETTER X}
, \\N{U+200A}
. \\N{LATIN SMALL LETTER X}
, \\N{U+200A}
。 \\l
(lower case next char), \\u\u003c/code> (upper case next char).
![](/img/trans.png)
\\l
(小写的下一个字符), \\u\u003c/code> (大写的下一个字符)。
\\L
(lower case), \\U
(upper case), \\Q
(quote meta characters) until \\E
. ![](/img/trans.png)
\\L
(小写), \\U
(大写), \\Q
(引用元字符),直到\\E
为止。
Shorthand notation for Unicode property \\pL
and \\PL
.
Unicode属性\\pL
和\\PL
简写形式。 You have to include the braces in .NET eg \\p{L}
.
您必须在.NET中包含括号,例如\\p{L}
。
Odd things like \\X
, \\C
.
诸如\\X
, \\C
类的奇怪东西。
Special character classes like \\v
, \\V
, \\h
, \\H
, \\N
, \\R
特殊字符类,例如\\v
, \\V
, \\h
, \\H
, \\N
, \\R
Backreference to a specific or previous group \\g1
, \\g{-1}
.
向后引用特定的或先前的\\g1
, \\g{-1}
。 You can only use absolute group index in .NET.
您只能在.NET中使用绝对组索引。
Named backreference \\g{name}
.
命名为反向引用\\g{name}
。 Use \\k<name>
instead.
请改用\\k<name>
。
POSIX character class [[:alpha:]]
.
POSIX字符类[[:alpha:]]
。
Branch-reset pattern (?|…)
分支重置模式(?|…)
\\K
. ![](/img/trans.png)
\\K
Use look-behind ( (?<=…)
) instead.
使用后向( (?<=…)
)代替。
Code evaluation assertion (?{…})
, post-poned subexpression (??{…})
.
代码评估断言(?{…})
,延迟子表达式(??{…})
。
Subexpression reference (recursive pattern) (?0)
, (?R)
, (?1)
, (?-1)
, (?+1)
, (?&name)
.
子表达式参考(递归模式) (?0)
, (?R)
, (?1)
, (?-1)
, (?+1)
, (?&name)
。
Some conditional expression's predicate are Perl-specific:
一些条件表达式的谓词是Perl特定的:
- code
(?{…})
代码(?{…})
- recursive
(R)
, (R1)
, (R&name)
递归(R)
, (R1)
, (R&name)
- define
(DEFINE)
.
定义(DEFINE)
。
Special Backtracking Control Verbs (*VERB:ARG)
特殊的回溯控制动词(*VERB:ARG)
Python syntax
Python语法
-
(?P<name>…)
. ![](/img/trans.png)
(?P<name>…)
。 Use (?<name>…)
instead.
使用(?<name>…)
代替。
-
(?P=name)
. ![](/img/trans.png)
(?P=name)
。 Use \\k<name>
instead.
请改用\\k<name>
。
-
(?P>name)
. ![](/img/trans.png)
(?P>name)
。 No equivalent in .NET.
.NET中没有等效项。
Some elements are .NET only:
某些元素仅是.NET:
Variable length look-behind.
可变长度后向。 In Perl, for positive look-behind, use \\K
instead.
在Perl中,要获得积极的后视效果,请改用\\K
Arbitrary regular expression in conditional expression (?(pattern)yes|no)
.
条件表达式中的任意正则表达式(?(pattern)yes|no)
。
Character class subtraction (undocumented?) [az-[dw]]
字符类减法(未记录?) [az-[dw]]
Balancing Group (?<-name>…)
.
平衡组(?<-name>…)
。 This could be simulated with code evaluation assertion (?{…})
followed by a (?&name)
.
这可以用代码评估断言(?{…})
后面跟(?&name)
来模拟。
References:
参考文献:
They were designed to be compatible with Perl 5 regexes. 它们被设计为与Perl 5正则表达式兼容 。 As such, Perl 5 regexes should just work in .NET.
因此,Perl 5正则表达式应该只在.NET中工作。
You can translate some RegexOptions
as follows: 您可以按以下方式翻译一些
RegexOptions
:
[Flags]
public enum RegexOptions
{
Compiled = 8,
CultureInvariant = 0x200,
ECMAScript = 0x100,
ExplicitCapture = 4,
IgnoreCase = 1, // i in Perl
IgnorePatternWhitespace = 0x20, // x in Perl
Multiline = 2, // m in Perl
None = 0,
RightToLeft = 0x40,
Singleline = 0x10 // s in Perl
}
Another tip is to use verbatim strings so that you don't need to escape all those escape characters in C#: 另一个技巧是使用逐字字符串,这样您就无需在C#中转义所有这些转义字符:
string badOnTheEyesRx = "\\d{4}/\\d{2}/\\d{2}";
string easierOnTheEyesRx = @"\d{4}/\d{2}/\d{2}";
It really depends on the complexity of the regular expression - many ones will work the same out of the box. 它实际上取决于正则表达式的复杂性-许多表达式可以立即使用。
Take a look at this .NET regex cheat sheet to see if an operator does what you expect it to do. 看看此 .NET regex备忘单,看看操作员是否按照您的期望去做。
I don't know of any tool that automatically translates between RegEx dialects. 我不知道能在RegEx方言之间自动翻译的任何工具。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.