简体   繁体   English

正则表达式从Java到PHP的转换

[英]Regex conversion from java to php

I have a regular expression in php and I need to convert it to java. 我在php中有一个正则表达式,我需要将其转换为java。 Is it possible to do so? 有可能这样做吗? If yes how can i do? 如果可以,我该怎么办?

Thanks in advance 提前致谢

$region_pattern = "/<a href=\"#\"><img src=\"images\/ponto_[^\.]+\.gif\"[^>]*>[&nbsp;]*<strong>(?P<neighborhood>[^\(<]+)\((?P<region>[^\)]+)\)<\/strong><\/a>/i" ;

A typical conversion from any regex to java is to: 从任何正则表达式到Java的典型转换是:

  • Exclude pattern delimiters => remove starting and trailing / 排除模式定界符=>删除开头和结尾/
  • Remove flags, these are applied to the Pattern object, this is the trailing i . 删除标记,这些标记将应用于Pattern对象,即尾随的i You should either put it in the initialisation of your Pattern object or prepend it to the regex like (?i)<regex> 您应该将其放在Pattern对象的初始化中,或将其添加到正则表达式前,例如(?i)<regex>
  • Replace all \\ with \\\\ , \\ has a meaning already in java(escape in strings), to use a backslash inside a regex in java you have to use \\\\ instead of \\ , so \\w becomes \\\\w . \\\\替换所有\\\\在java中已经有含义(在字符串中转义),要在Java的正则表达式中使用反斜杠,必须使用\\\\代替\\ ,因此\\w变为\\\\w and \\\\ becomes \\\\\\\\ \\\\变成\\\\\\\\

Above regex would become 正则表达式以上将成为

Pattern.compile("<a href=\"#\"><img src=\"images\\/ponto_[^\\.]+\\.gif\"[^>]*>[&nbsp;]*<strong>(?P<neighborhood>[^\\(<]+)\\((?P<region>[^\\)]+)\\)<\\/strong><\\/a>", Pattern.CASE_INSENSITIVE);

This will fail however, I think it is because ?P is a modifier, not one I know exists in Java so ye it is a invalid regex. 但是,这将失败,我认为这是因为?P是修饰符,Java中不存在一个修饰符,所以它是无效的正则表达式。

There are some problems with the original regex that have to be cleared away first. 原始正则表达式存在一些必须首先解决的问题。 First, there's [&nbsp;] , which matches one of the characters & , n , b , s , p or ; 首先,有[&nbsp;] ,它与字符&nbsp;中的一个匹配; . To match an actual non-breaking space character, you should use \\xA0 . 要匹配实际的不间断空格字符,应使用\\xA0

You also have a lot of unneeded backslashes in there. 您那里也有很多不需要的反斜杠。 You can get rid of some by changing the regex delimiter to something other than / ; 您可以通过将正则表达式定界符更改为/以外的内容来摆脱某些限制。 others aren't needed because they're inside character classes, where most metacharacters lose their special meanings. 其他字符则不需要,因为它们位于字符类中,而大多数元字符都失去了特殊的含义。 That leaves you with this PHP regex: 剩下的就是这个PHP正则表达式:

"~<a href=\"#\"><img src=\"images/ponto_[^.]+\.gif\"[^>]*>\xA0*<strong>(?P<neighborhood>[^(<]+)\((?P<region>[^)]+)\)</strong></a>~i"

There are three things that make this regex incompatible with Java. 有三件事使此正则表达式与Java不兼容。 One is the delimiters ( / originally, ~ in the version above) along with the trailing i modifier. 一个是定界符( /最初是上述版本中的~ )以及结尾的i修饰符。 Java doesn't use regex delimiters at all, so just drop those. Java根本不使用正则表达式定界符,因此只需删除它们即可。 The modifier can be moved into the regex itself by using the inline form, (?i) , at the beginning of the regex. 可以在正则表达式的开头使用内联形式(?i)将修饰符移动到正则表达式本身中。 (That will work in PHP too, by the way.) (顺便说一下,这也可以在PHP中使用。)

Next is the backslashes. 接下来是反斜杠。 The ones that are used to escape quotation marks remain as they are, but all the others get doubled because Java is more strict about escape sequences in string literals. 那些用于转义引号的代码保持原样,但其他所有代码都会加倍,因为Java对字符串文字中的转义序列更加严格。

Finally, there are the named groups. 最后,有命名组。 Up until Java 6, named groups weren't supported at all; 直到Java 6为止,完全不支持命名组。 Java 7 supports them, but they use the shorter (?<name>...) syntax favored by .NET, not the Pythonesque (?P<name>...) syntax. Java 7支持它们,但是它们使用.NET支持的较短(?<name>...)语法,而不是Pythonesque (?P<name>...)语法。 (By the way, the shorter (?<name>...) version should work in PHP, too (as should (?'name'...) , also introduced by .NET). (顺便说一下,较短的(?<name>...)版本也应在PHP中运行( (?'name'...) .NET也应引入(?'name'...) ))。

So the Java 7 version of your regex would be: 因此,您的正则表达式的Java 7版本为:

"(?i)<a href=\"#\"><img src=\"images/ponto_[^.]+\\.gif\"[^>]*>\\xA0*<strong>(?<neighborhood>[^(<]+)\\((?<region>[^)]+)\\)</strong></a>"

For Java 6 or earlier you would use: 对于Java 6或更早版本,您将使用:

"(?i)<a href=\"#\"><img src=\"images/ponto_[^.]+\\.gif\"[^>]*>\\xA0*<strong>([^(<]+)\\(([^)]+)\\)</strong></a>"

...and you'd have to use numbers instead of names to refer to the group captures. ...并且您必须使用数字而不是名称来引用组捕获。

REGEX is REGEX regardless of language. REGEX是REGEX,与语言无关。 The REGEX you've posted will work on both Java and PHP. 您发布的REGEX可以在Java和PHP上使用。 You do need to make some adjustments as both language don't take the pattern exactly the same (though the pattern itself will work in both languages). 您确实需要进行一些调整,因为两种语言使用的模式都不完全相同(尽管模式本身可以同时在两种语言中使用)。

Points to Consider 要考虑的要点

  • You should know that Java's Pattern object applies flags without having to specify them on the pattern string itself. 您应该知道Java的Pattern对象将应用标志,而不必在模式字符串本身上指定标志。
  • Delimiters should not be included as well. 分隔符也不应包括在内。 Only the pattern itself. 仅模式本身。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM