简体   繁体   中英

Jsoup extract Hrefs from the HTML content

My problem is that I try to get the Hrefs from this site with JSoup

https://www.amazon.de/s?k=kissen&__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&ref=nb_sb_noss_2

but it does not work.

I tried to select the class from the Href like this

Elements elements = documentMainSite.select(".a-link-normal");

and after that I tried to extract the Hrefs with the following piece of code.

for (Element element : elements) {
  String href = element.attributes().get("href");
}

but unfortunately it gives me nothing...

Can someone tell me where is my mistake please?


I don't just connect to the website. I also save the hrefs in a string by extracting them with

String href = element.attributes().get("href");

after that I've print the href String but is empty.

On another side the code works with another css selector. so it has nothing to do with the code by it self. its just the css selector (.a-link-normal) that is probably wrong.

You won't get anything by simply connecting to the url via Jsoup.

Document document = Jsoup.connect(yourUrl).get();
String bodyText = document.getElementsByTag("body").get(0).text();

Here is the translation of the body text, which I got from the above code.

Enter the characters below We ask for your understanding and want to be sure that you are not a bot. For best results, please use a browser that accepts cookies. Type the characters you see in the image: Enter characters Try another image Continue shopping Terms & Conditions Privacy Policy © 1996-2015, Amazon.com, Inc. or its affiliates

Either you need to bypass captcha or emulate a browser by means of Selenium, for example.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM