简体   繁体   中英

How to effectively know what's wrong with input via Jsoup?

I am trying to validate HTML code using Jsoup and the method Jsoup.isValid always returns false but I'm in the dark here because it does not tell me where the error is.

Here is my code:

class PageWhitelist extends Whitelist {
    public PageWhitelist() {
        addTags("html", "head", "meta", "style", "body", "a", "div");
        addProtocols("a", "href", "http");
    }
}

String markup = "<body><head>...";

PageWhitelist whitelist = new PageWhitelist();
boolean valid = Jsoup.isValid(markup, whitelist);
assertTrue(valid);

valid simply evaluates to false, the test fails and Jsoup does not give me any clue of what is causing the error whatsoever.

How can I know what is really going on?

Well, I've never used isValid cause HTML validation is something that depends on what you really want to see in the page. For example, let's say you requested a page and selected an element. This element being NULL means that the HTML is invalid for me.

What I do is: let's say that I want a anchor like

a href="http://stackoverflow.com/questions/28509726/how-to-effectively-know-whats-wrong-with-input-via-jsoup

I use the Jsoup to select the element. If it comes NULL means that my page is invalid.

Elements anchors = Jsoup.parse(html).select("a[href*=stackoverflow.com/questions/]");
if (anchors.isEmpty()) {
  // Invalid
}
// valid

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM