简体   繁体   English

java模式在文本之间查找html标记

[英]java pattern find html tag between text

I want find text 'ABCD' in 我想找到'ABCD'文本

String text = "<div class=\"aaaa\">1234</div>"
            + "   <li class=\"pcs05\">ABCD</li>";

Pattern p = Pattern.compile("<li class=[^A-Za-z0-9]>(\\S+)</li>");
Matcher m = p.matcher(text);
if(m.find()){
    System.out.println(m.group(1));
}

but it doesn't print anything. 但它不会打印任何东西。

String text =  "<div class=\"aaaa\">1234</div>";
               text +=    "<li class=\"pcs05\">ABCD</li>";
    Pattern p = Pattern.compile("<li class=\"[A-Za-z0-9]+\">(\\S+)</li>");
    Matcher m = p.matcher(text);
    if(m.find()){
        System.out.println(m.group(1));
    }

Preferred tool for this kind of task is HTML or XML parser (more info Can you provide some examples of why it is hard to parse XML and HTML with a regex? ). 用于此类任务的首选工具是HTML或XML解析器(更多信息您能提供一些示例,说明为什么难以使用正则表达式解析XML和HTML? )。 One of simpler parser I like to use is jsoup . 我喜欢使用的一个更简单的解析器是jsoup Nice thing about it is that it supports CSS query syntax. 关于它的好处是它支持CSS查询语法。

So your code could look like: 所以你的代码看起来像:

String text = "<div class=\"aaaa\">1234</div>"
            + "   <li class=\"pcs05\">ABCD</li>";

Document doc = Jsoup.parse(text);
String liValue = doc.select("li").text();

System.out.println(liValue);

Output: ABCD 输出:ABCD

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM