简体   繁体   English

无法在Java中获得正则表达式的匹配项

[英]Can't get a match for regular expression in Java

This is the format/example of the string I want to get data: 这是我想要获取数据的字符串的格式/示例:

<span style='display:block;margin-bottom:3px;'><a style='margin:4px;color:#B82933;font-size:120%' href='/cartelera/pelicula/18312'>Español  </a></span><br><span style='display:block;margin-bottom:3px;'><a style='margin:4px;color:#FBEBC4;font-size:120%' href='/cartelera/pelicula/18313'>Subtitulada  </a></span><br>          </div>

And this is the regular expression I'm using for it: 这是我正在使用的正则表达式:

"pelicula/([0-9]*)'>([\\w\\s]*)</a>"

I tested this regular expression in RegexPlanet , and it turned out OK, it gave me the expected result: 我在RegexPlanet中测试了这个正则表达式,结果没问题,它给了我预期的结果:

group(1) = 18313
group(2) = Subtitulada

But when I try to implement that regular expression in Java, it won't match anything. 但是当我尝试在Java中实现该正则表达式时,它将无法匹配任何内容。 Here's the code: 这是代码:

Pattern pattern = Pattern.compile("pelicula/([0-9]*)'>([\\w\\s]*)</a>");              
            Matcher matcher = pattern.matcher(inputLine);            
            while(matcher.find()){
                    version = matcher.group(2);
                }
            }

What's the problem? 有什么问题? If the regular expression is already tested, and in that same code I search for more patterns but I'm having trouble with two (I'm showing you here just one). 如果正则表达式已经过测试,并且在相同的代码中我搜索了更多的模式,但我遇到了两个问题(我在这里只展示了一个)。 Thank you in advance! 先感谢您!

_ EDIT _ _ _ 编辑 _ _

I discovered the problem... If I check the sourcecode of the page it shows everything, but when I try to consume it from Java, it gets another sourcecode. 我发现了问题...如果我检查页面的源代码它会显示所有内容,但是当我尝试从Java中使用它时,它会获得另一个源代码。 Why? 为什么? Because this page asks for your city so it can show information about that. 因为此页面要求您的城市,所以它可以显示有关该城市的信息。 I don't know if there's a workaround about that to actually access the information I want, but that's it. 我不知道是否有关于实际访问我想要的信息的解决方法,但就是这样。

Your regex is correct but it seems \\w does not match ñ . 你的正则表达式是正确的,但似乎\\w不匹配ñ

I changed the regex to 我改变了正则表达式

"pelicula/([0-9]*)'>(.*?)</a>"

and it seems to match both the occurrences. 它似乎匹配两个事件。 Here I've used the reluctant *? 我在这里使用了不情愿的*? operator to prevent .* match all characters in between first <a> till last <\\a> See What is the difference between `Greedy` and `Reluctant` regular expression quantifiers? 运算符以防止.*匹配所有字符之间的第一个<a>直到最后<\\a>请参阅`Greedy`和`Reluctant`正则表达式量词之间有什么区别? for explanation. 作出解释。

@Bohemian is correct in pointing out that you might need to enable the Pattern.DOTALL flag as well if the text in <a> has line breaks @Bohemian指出你可能需要启用Pattern.DOTALL标志,如果<a>的文本有换行符

If your input is over several lines (ie it contains newline characters) you'll need to turn on "dot matches newline". 如果您的输入超过几行(即它包含换行符),则需要打开“点匹配换行符”。

There are two way to do this: 有两种方法可以做到这一点:

Use the "dot matches newline" regex switch (?s) in your regex: 在正则表达式中使用“dot matches newline”正则表达式开关(?s)

Pattern pattern = Pattern.compile("(?s)pelicula/([0-9]*)'>([\\w\\s]*)</a>");

or use the Pattern.DOTALL flag in the call to Pattern.compile() : 或者在调用Pattern.compile()使用Pattern.DOTALL标志:

Pattern pattern = Pattern.compile("pelicula/([0-9]*)'>([\\w\\s]*)</a>", Pattern.DOTALL);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM