简体   繁体   English

如何使用Java正则表达式过滤html文件内容?

[英]How can I filter html file content with java regular expressions?

With java I can download a webpage into the txt file.Now I want to read values from this txt file by regular expression.The below is the small part of the real html. 使用Java我可以将网页下载到txt文件中。现在,我想通过正则表达式从该txt文件中读取值。以下是实际html的一小部分。

<div>
   <input id="_NAME" value="/John/" />
   <input id="_LASTNAME" value="/BOND/"/>
   <input id="_Class"   value="5" />
</div>

I just want to read values according to id (_Name and _LASTNAME)? 我只想根据ID(_Name和_LASTNAME)读取值? Thanks in advance 提前致谢

As long as the HTML file is usable when browsing, it should be parsable by Jsoup. 只要在浏览时可以使用HTML文件,Jsoup就可以对其进行解析。 Since you are only querying attributes of input element, you don't really have to worry about the structure of the resulting DOM. 由于您仅查询input元素的属性,因此您实际上不必担心结果DOM的结构。

Sample code, using your example of HTML, with a bunch of bad HTML tag in front: 使用您的HTML示例的示例代码,前面带有一堆错误的HTML标签:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;


public class SO27938637 {
    public static void main(String[] args) {
        Document doc = Jsoup.parse("<div><span><div><b>sdf</span>dsf<i>sdfdsfsdfds<span></div><div>\n    <input id=\"_NAME\" value=\"/John/\" />\n   <input id=\"_LASTNAME\" value=\"/BOND/\"/>\n   <input id=\"_Class\"   value=\"5\" /></div>");
        Elements inputElement = doc.select("input");

        for (Element e: inputElement) {
            System.out.println(e.attr("id") + ": " + e.attr("value"));
        }
    }
}

Output: 输出:

_NAME: /John/
_LASTNAME: /BOND/
_Class: 5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM