使用正则表达式

Question

I am having problems trying to use the regular expression that I used in JavaScript.我在尝试使用我在 JavaScript 中使用的正则表达式时遇到问题。 On a web page, you may have:在网页上，您可能有：

<b>Renewal Date:</b> 03 May 2010</td>

I just want to be able to pull out the 03 May 2010, remembering that a webpage has more than just the above content.我只是希望能够抽出 2010 年 5 月 3 日，记住一个网页不仅仅包含上述内容。 The way I currently perform this using JavaScript is:我目前使用 JavaScript 执行此操作的方式是：

DateStr = /<b>Renewal Date:<\/b>(.+?)<\/td>/.exec(returnedHTMLPage);

I tried to follow some tutorials on java.util.regex.Pattern and java.util.regex.Matcher with no luck.我试图按照一些关于java.util.regex.Pattern和java.util.regex.Matcher教程进行操作，但没有成功。 I can't seem to be able to translate (.+?) into something they can understand??我似乎无法将(.+?)翻译成他们能理解的东西？？

thanks,谢谢，

Noeneel诺内尔

Answer 1

This is how regular expressions are used in Java:这就是 Java 中正则表达式的使用方式：

Pattern p = Pattern.compile("<b>Renewal Date:</b>(.+?)</td>");
Matcher m = p.matcher(returnedHTMLPage);

if (m.find()) // find the next match (and "generate the groups")
    System.out.println(m.group(1)); // prints whatever the .+? expression matched.

There are other useful methods in the Matcher class, such as m.matches() . Matcher 类中还有其他有用的方法，例如m.matches() 。 Have a look at Matcher .看看Matcher 。

Answer 2

On `matches` vs `find` `matches`与`find`

The problem is that you used matches when you should've used find .问题是您在应该使用find时使用了matches 。 From the API :从API ：

The matches method attempts to match the entire input sequence against the pattern. matches方法尝试将整个输入序列与模式进行匹配。

The find method scans the input sequence looking for the next subsequence that matches the pattern. find方法扫描输入序列，寻找与模式匹配的下一个子序列。

Note that String.matches(String regex) also looks for a full match of the entire string.请注意， String.matches(String regex)还会查找整个字符串的完整匹配项。 Unfortunately String does not provide a partial regex match, but you can always s.matches(".*pattern.*") instead.不幸的是String不提供部分正则表达式匹配，但你总是可以s.matches(".*pattern.*")代替。

On reluctant quantifier关于不情愿的量词

Java understands (.+?) perfectly. Java 完全理解(.+?) 。

Here's a demonstration: you're given a string s that consists of a string t repeating at least twice.这是一个演示：给定一个字符串s ，它由至少重复两次的字符串t组成。 Find t .找到t 。

System.out.println("hahahaha".replaceAll("^(.+)\\1+$", "($1)"));
// prints "(haha)" -- greedy takes longest possible

System.out.println("hahahaha".replaceAll("^(.+?)\\1+$", "($1)"));
// prints "(ha)" -- reluctant takes shortest possible

On escaping metacharacters关于转义元字符

It should also be said that you have injected \\ into your regex ( "\\\\" as Java string literal) unnecessarily.还应该说您不必要地将\\注入了正则表达式（ "\\\\"作为 Java 字符串文字）。

        String regexDate = "<b>Expiry Date:<\\/b>(.+?)<\\/td>";
                                            ^^         ^^
        Pattern p2 = Pattern.compile("<b>Expiry Date:<\\/b>");
                                                      ^^

\\ is used to escape regex metacharacters. \\用于转义正则表达式元字符。 A / is NOT a regex metacharacter. A /不是正则表达式元字符。

See also也可以看看

Regular expressions and escaping special characters 正则表达式和转义特殊字符

Answer 3

Ok, so using aioobe's original suggestion (which i also tried earlier), I have:好的，所以使用 aioobe 的原始建议（我之前也尝试过），我有：

String regexDate = "<b>Expiry Date:</b>(.+?)</td>";
Pattern p = Pattern.compile(regexDate);
Matcher m = p.matcher(returnedHTML);

if (m.matches()) // check if it matches (and "generate the groups")
{
  System.out.println("*******REGEX RESULT*******"); 
  System.out.println(m.group(1)); // prints whatever the .+? expression matched.
  System.out.println("*******REGEX RESULT*******"); 
}

The IF statement must keep coming up FALSE as the *******REGEX RESULT******* is never outputted. IF 语句必须不断出现 FALSE，因为 *******REGEX RESULT******* 永远不会输出。

If anyone missed what I am trying to achieve, I am just wanting to get the date out.如果有人错过了我想要实现的目标，我只是想确定日期。 Amongst a html page is a date like <b>Expiry Date:</b> 03 May 2010</td> and I want the 03 May 2010.在 html 页面中有一个类似<b>Expiry Date:</b> 03 May 2010</td> ，我想要 2010 年 5 月 3 日。

Answer 4

(.+?) is an odd choice. (.+?)是一个奇怪的选择。 Try ( *[0-9]+ *[A-Za-z]+ *[0-9]+ *) or just ([^<]+) instead.试试( *[0-9]+ *[A-Za-z]+ *[0-9]+ *)或者只是([^<]+) 。

使用正则表达式

问题描述

4 个解决方案

解决方案1
4 已采纳 2010-05-03 16:05:15

解决方案2
4 2010-05-04 08:25:05

On `matches` vs `find` `matches`与`find`

On reluctant quantifier关于不情愿的量词

On escaping metacharacters关于转义元字符

See also也可以看看

解决方案3
1 2010-05-04 09:07:52

解决方案4
0 2010-05-04 08:38:01

使用正则表达式

问题描述

4 个解决方案

解决方案1 4 已采纳 2010-05-03 16:05:15

解决方案2 4 2010-05-04 08:25:05

On matches vs find matches与find

On reluctant quantifier关于不情愿的量词

On escaping metacharacters关于转义元字符

See also也可以看看

解决方案3 1 2010-05-04 09:07:52

解决方案4 0 2010-05-04 08:38:01

解决方案1
4 已采纳 2010-05-03 16:05:15

解决方案2
4 2010-05-04 08:25:05

On `matches` vs `find` `matches`与`find`

解决方案3
1 2010-05-04 09:07:52

解决方案4
0 2010-05-04 08:38:01