简体   繁体   中英

Jsoup is returning text which I do not see in the HTML document

public class Test {
   public static void main(String[] args) throws IOException {
     Document doc = Jsoup.connect("https://bs.to/Game-of-Thrones").get();
     Elements link = doc.select("p");

     System.out.println(link.text());
   }
}

This is the code I use to get the only p tag element of the given website. But I get a text, which is not in the html document. It seems to be a text which belongs to the general website though (it's in german so I don't mind posting the result text).

Also, if I loop all p elements, I get more text, that should not be in the document, but not the text that I'm looking for.

Why could that be? Thanks in advance!

Edit:

  Document doc = Jsoup.connect("https://bs.to/andere-serien")
                  .userAgent("Mozilla/5.0 (Windows; U; WindowsNT 5.1; en-US;    rv1.8.1.6) Gecko/20070725 Firefox/2.0.0.6")
                  .referrer("http://www.google.com")
                  .get();

Adding the userAgent did solve the issue, thanks Sean Patrick Floyd!

It could be they are serving different content for different user agents. Try setting your user agent to that of a real browser.

See this question for solutions:
JSoup UserAgent, how to set it right?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM