I'm having hard time understanding what a certain Java regex would match:
"<(\\w+)></\\1>"
I've read through this http://docs.oracle.com/javase/tutorial/essential/regex/
But I still can't figure out what that expression would match to, especially the \\1
part. I can see that <(\\w+)>
is a possessive quantifier matching any word but I don't understand why use the ()
which according to the tutorial are for matching a group.
As for the second part, I just don't know what \\1
would match. I tried it with
"001123344556678899".replaceAll("\\1", "");
since I thought just maybe it matches a number, but it gave me back my string as is nothing replaced.
It's intended to match pairs of XML/HTML tags, such as
<tag></tag>
The \\\\1
means match to the first matched group, ie the thing in the parentheses. (The double backslash is because backslashes need to be escaped in Java string literals.)
I think you may have misunderstood the tutorial. Anything inside ()
are a set, so (\\w{1})(\\w{1})
would mean you have 2 sets having 1 character in each. the \\1
, reference the first set. So it is more like this in you search and replace:
"1234234234234".replaceAll("(23)", "\\1ab")
and the result would be "123ab423ab423ab..."
, \\1
returns you what you match in your first set.
Just refresh your understanding of regex backreferences (and capturing groups), eg here . Capturing group uses ()
and backreference would be replaced by data captured by referenced group.
Then use this site to test your expression and your data like this:
Regular Expression: <(\\w+)></\\1>
would become a Java string "<(\\\\w+)></\\\\1>"
with input like this <body></body>
:
Test Target String matches() replaceFirst() replaceAll() group(0) group(1)
1 <body></body> Yes Yes Yes <body></body> body
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.