简体   繁体   English

Jsoup从HTML内容中提取Href

[英]Jsoup extract Hrefs from the HTML content

My problem is that I try to get the Hrefs from this site with JSoup 我的问题是我尝试使用JSoup从此站点获取Href

https://www.amazon.de/s?k=kissen&__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&ref=nb_sb_noss_2 https://www.amazon.de/s?k=kissen&__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&ref=nb_sb_noss_2

but it does not work. 但它不起作用。

I tried to select the class from the Href like this 我试图像这样从Href中选择班级

Elements elements = documentMainSite.select(".a-link-normal");

and after that I tried to extract the Hrefs with the following piece of code. 之后,我尝试使用以下代码提取Href。

for (Element element : elements) {
  String href = element.attributes().get("href");
}

but unfortunately it gives me nothing... 但不幸的是它什么也没给我...

Can someone tell me where is my mistake please? 有人可以告诉我我的错误在哪里吗?


I don't just connect to the website. 我不只是连接到该网站。 I also save the hrefs in a string by extracting them with 我还通过使用以下方法将hrefs保存为字符串:

String href = element.attributes().get("href");

after that I've print the href String but is empty. 之后,我打印了href字符串,但为空。

On another side the code works with another css selector. 另一方面,代码可与另一个CSS选择器一起使用。 so it has nothing to do with the code by it self. 因此它本身与代码无关。 its just the css selector (.a-link-normal) that is probably wrong. 它只是CSS选择器(.a-link-normal)可能是错误的。

You won't get anything by simply connecting to the url via Jsoup. 仅通过Jsoup连接到url就不会有任何结果。

Document document = Jsoup.connect(yourUrl).get();
String bodyText = document.getElementsByTag("body").get(0).text();

Here is the translation of the body text, which I got from the above code. 这是正文的翻译,是我从上面的代码中获得的。

Enter the characters below We ask for your understanding and want to be sure that you are not a bot. 在下面输入字符。我们要求您的理解,并希望确保您不是机器人。 For best results, please use a browser that accepts cookies. 为了获得最佳效果,请使用接受cookie的浏览器。 Type the characters you see in the image: Enter characters Try another image Continue shopping Terms & Conditions Privacy Policy © 1996-2015, Amazon.com, Inc. or its affiliates 键入您在图像中看到的字符:输入字符尝试其他图像继续购物条款和条件隐私政策©1996-2015,Amazon.com,Inc.或其分支机构

Either you need to bypass captcha or emulate a browser by means of Selenium, for example. 例如,您需要绕过验证码或通过Selenium模拟浏览器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM