简体   繁体   中英

Parsing HTML on size of input field in Java with JSoup

I'm trying to parse a error file from a system which is presented to me in HTML. Don't find it very pretty, but this is what I have to work with.

The errors are presented with codes which I can find a reference to in a catalog based on a set and a message id.

<HTML>
<BODY>
<h4>2020-07-16 10:24:22.614</h4>
<SPAN STYLE="color:black; font:bold;">&nbsp;&nbsp;Set:<INPUT TYPE="text" VALUE="158" SIZE=3</INPUT>&nbsp;&nbsp;Id: <INPUT TYPE="text" VALUE="10420" SIZE=5</INPUT>
</SPAN>
</BODY>
</HTML>

I'm trying to parse the timestamp, and the two values in the input fields with JSoup. The timestamp is not a problem at all, but I don't seem to find a way to parse the Set and the Id of the message.

Document doc = Jsoup.parse(errorLog, "UTF-8", "");
Element body = doc.body();

Elements MessageTimestamps = doc.select("h4");
Elements MessageSets = doc.getElementsByAttributeValue("SIZE", "3");
Elements MessageID = doc.getElementsByAttributeValue("SIZE","5");

String[] timestampArray = new String[MessageTimestamps.size()];
System.out.println("Total: " + timestampArray.length);

for(int i = 0; i< MessageTimestamps.size(); i++) {
    System.out.println("Timestamp: " + MessageTimestamps.get(i).text());
    System.out.println("MessageSets: " + MessageSets.get(i).text());
}

Result:

Total: 6
Timestamp: 2020-07-16 10:24:22.614
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0

Anyone an idea?

You could select the input fields having a SIZE attribute which contain the values 3 or 5 by doing something like:

public static void main(String[] args){
    String html = "<HTML>\n"
            + "<BODY>\n"
            + "<h4>2020-07-16 10:24:22.614</h4>\n"
            + "<SPAN STYLE=\"color:black; font:bold;\">&nbsp;&nbsp;Set:<INPUT TYPE=\"text\" VALUE=\"158\" SIZE=3</INPUT>&nbsp;&nbsp;Id: <INPUT TYPE=\"text\" VALUE=\"10420\" SIZE=5</INPUT>\n"
            + "</SPAN>\n"
            + "</BODY>\n"
            + "</HTML>";
    Document doc = Jsoup.parse(html);
    Element time = doc.selectFirst("h4");
    Element set = doc.selectFirst("INPUT[SIZE*=3]");
    Element id = doc.selectFirst("INPUT[SIZE*=5]");
    
    System.out.println(time.text());
    System.out.println(set.attr("value"));
    System.out.println(id.attr("value"));
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM