This is the format/example of the string I want to get data:
<span style='display:block;margin-bottom:3px;'><a style='margin:4px;color:#B82933;font-size:120%' href='/cartelera/pelicula/18312'>Español </a></span><br><span style='display:block;margin-bottom:3px;'><a style='margin:4px;color:#FBEBC4;font-size:120%' href='/cartelera/pelicula/18313'>Subtitulada </a></span><br> </div>
And this is the regular expression I'm using for it:
"pelicula/([0-9]*)'>([\\w\\s]*)</a>"
I tested this regular expression in RegexPlanet , and it turned out OK, it gave me the expected result:
group(1) = 18313
group(2) = Subtitulada
But when I try to implement that regular expression in Java, it won't match anything. Here's the code:
Pattern pattern = Pattern.compile("pelicula/([0-9]*)'>([\\w\\s]*)</a>");
Matcher matcher = pattern.matcher(inputLine);
while(matcher.find()){
version = matcher.group(2);
}
}
What's the problem? If the regular expression is already tested, and in that same code I search for more patterns but I'm having trouble with two (I'm showing you here just one). Thank you in advance!
_ EDIT _ _
I discovered the problem... If I check the sourcecode of the page it shows everything, but when I try to consume it from Java, it gets another sourcecode. Why? Because this page asks for your city so it can show information about that. I don't know if there's a workaround about that to actually access the information I want, but that's it.
Your regex is correct but it seems \\w
does not match ñ
.
I changed the regex to
"pelicula/([0-9]*)'>(.*?)</a>"
and it seems to match both the occurrences. Here I've used the reluctant *?
operator to prevent .*
match all characters in between first <a>
till last <\\a>
See What is the difference between `Greedy` and `Reluctant` regular expression quantifiers? for explanation.
@Bohemian is correct in pointing out that you might need to enable the Pattern.DOTALL
flag as well if the text in <a>
has line breaks
If your input is over several lines (ie it contains newline characters) you'll need to turn on "dot matches newline".
There are two way to do this:
Use the "dot matches newline" regex switch (?s)
in your regex:
Pattern pattern = Pattern.compile("(?s)pelicula/([0-9]*)'>([\\w\\s]*)</a>");
or use the Pattern.DOTALL
flag in the call to Pattern.compile()
:
Pattern pattern = Pattern.compile("pelicula/([0-9]*)'>([\\w\\s]*)</a>", Pattern.DOTALL);
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.