如何在jsoup链接中获取文本？

Question

我正在使用jsoup解析一个html页面。 这是我到目前为止所做的：

doc = Jsoup.connect("http://www.marketimyilmazlar.com/index.php?route=product/category&path=141_77").get();

Element page_clips = doc.getElementById("page_clips");

Element page_clip_content = page_clips.getElementById("content");
Elements allProductPricesOnPage = page_clip_content.getElementsByClass("price");

现在，当我写：

allProductNamesOnPage.get(0);

它返回以下内容：

<div class="name">
<a href="http://www.marketimyilmazlar.com/index.php? 
route=product/product&amp;path=141_77&amp;product_id=4309"> here is the text</a>
</div>

我想要做的是，我想得到该对象的“这里是文本”部分。 任何人都可以帮助我吗？

谢谢

Answer 1

如果只想提取文本，可以调用text()方法：

String text = allProductNamesOnPage.get(0).text();

此方法获取Element及其组合子项的文本。 因此，如果您想确保仅从a元素中提取文本，请在第一个子元素上调用text() ：

String text = allProductNamesOnPage.get(0).child(0).text();

见这里： http ： //jsoup.org/cookbook/extracting-data/attributes-text-html

Answer 2

您可能希望迭代已收集的Elements并逐个打印它们的价格：

Elements allProductPricesOnPage = page_clip_content
                .getElementsByClass("price");
for (Element el : allProductPricesOnPage) {
    System.out.println(el.text());
}

给人，

19.99 TL KDV Dahil
9.99 TL KDV Dahil
14.99 TL KDV Dahil

它的作用是，您选择实现Iterator Elements （请参阅此处的 javadoc），它允许您访问集合中的各个Element对象。

在HTML中重复的每个Element对象都包含您要提取的相关信息。

如何在jsoup链接中获取文本？

问题描述

2 个解决方案

解决方案1
1 2014-02-07 14:40:57

解决方案2
1 已采纳 2014-02-07 18:17:15

如何在jsoup链接中获取文本？

问题描述

2 个解决方案

解决方案1 1 2014-02-07 14:40:57

解决方案2 1 已采纳 2014-02-07 18:17:15

解决方案1
1 2014-02-07 14:40:57

解决方案2
1 已采纳 2014-02-07 18:17:15