How can I match a repeating pattern with Java regular expressions?

Question

Given the following input string 3481.7.1071.html

I want to confirm that

The string has 1 or more numbers followed by a period.
The string ends in html .

Finally, I want to extract the left-most number (ie 3481).

My current regex is nearly there but I can't capture the correct group:

final Pattern p = Pattern.compile("(\\d++\\.)+html");   
final Matcher m = p.matcher("3481.7.1071.html");
if (m.matches()) {
    final String corrected = m.group(1)+"html"; // WRONG! Gives 1071.html
}

How do I capture the first match?

Answer 1

You can just factor it out:

(\d+\.)(\d+\.)*html

Answer 2

"^(\\d+)\\.(\\d+\\.)*html$"

Answer 3

groovy:000> p = java.util.regex.Pattern.compile("(\\d+).*") 
===> (\d+).*
groovy:000> m = p.matcher("3481.7.1071.html")
===> java.util.regex.Matcher[pattern=(\d+).* region=0,16 lastmatch=]
groovy:000> m.find()
===> true
groovy:000> m.group(1)+".html"
===> 3481.html
groovy:000>

Answer 4

Yes, you can.

If 123.html and 1.23html and are valid , use this :

^(?:(\d+)\.).*?html$

If 123.html is invalid but 1.23html valid , use this :

^(?:(\d+)\.(?!h)).*?html$

If 123.html and 1.23html are invalid but only 1.23.html valid , use this :

^(?:(\d+)\.).*?\.html$

Answer 5

Java style: "(\\d+)\\..*?\\.html$"

This will 1) grab the first group of consecutive digits, 2) require a dot after words, 3) jump over everything except 3) the literal string '.html'.

If you mean "one or more [ groups ] of numbers followed by a period" then this is more along the lines of your requirements.

"(\\d+)(?:\\.\\d+)*\\.html$"

This way you get a number and not the dot. And none of the other patterns need to be captured, so they are not.

Answer 6

jpalecek's solution fails; it captures the rightmost number. The original poster was a lot closer, but he got the right-most number. To get the left-most number, ignore anything after the first dot:

[^\d]*(\d+)\..*html

[^\d]* ignores everything before the left-most number (so X1.html captures number 1) (\d+). captures the first digits, if they are followed by a dot. .* ignores everything between the dot and the final html.

How can I match a repeating pattern with Java regular expressions?

Question

6 answers

solution1
7 2009-04-02 09:21:21

solution2
3 2009-04-02 09:54:15

solution3
0 2012-08-31 14:11:35

solution4
0 2012-08-31 14:38:50

solution5
0 2009-04-02 16:57:05

solution6
-1 2009-04-02 09:56:39

How can I match a repeating pattern with Java regular expressions?

Question

6 answers

solution1 7 2009-04-02 09:21:21

solution2 3 2009-04-02 09:54:15

solution3 0 2012-08-31 14:11:35

solution4 0 2012-08-31 14:38:50

solution5 0 2009-04-02 16:57:05

solution6 -1 2009-04-02 09:56:39

solution1
7 2009-04-02 09:21:21

solution2
3 2009-04-02 09:54:15

solution3
0 2012-08-31 14:11:35

solution4
0 2012-08-31 14:38:50

solution5
0 2009-04-02 16:57:05

solution6
-1 2009-04-02 09:56:39