简体   繁体   English

使用JSoup从Amazon检索评论

[英]Retrieving Reviews from Amazon using JSoup

I'm using JSoup to retrive reviews from a particular webpage in Amazon and what I have now is this: 我正在使用JSoup从Amazon中的特定网页检索评论,而我现在所拥有的是:

    Document doc = Jsoup.connect("http://www.amazon.com/Presto-06006-Kitchen-Electric-Multi-Cooker/product-reviews/B002JM202I/ref=sr_1_2_cm_cr_acr_txt?ie=UTF8&showViewpoints=1").get();
    String title = doc.title();

    Element reviews = doc.getElementById("productReviews");
    System.out.println(reviews);

This gives me the block of html which has the reviews but I want only the text without all the tags div etc. I want to then write all this information into a file. 这给了我带有评论的html块,但是我只想要没有所有标签div等的文本。然后我想将所有这些信息写入文件。 How can I do this? 我怎样才能做到这一点? Thanks! 谢谢!

使用text()方法

System.out.println(reviews.text());

While text() will get you a bunch of text, you'll want to first use jsoup's select(...) methods to subdivide the problem into individual review elements. 虽然text()将为您提供大量文本,但是您将需要首先使用jsoup的select(...)方法将问题细分为单独的审阅元素。 I'll give you the first big division, but it will be up to you to subdivide it further: 我将给您第一个大部门,但您可以进一步细分:

public static List<Element> getReviewList(Element reviews) {
  List<Element> revList = new ArrayList<Element>();
  Elements eles = reviews.select("div[style=margin-left:0.5em;]");
  for (Element element : eles) {
     revList.add(element);
  }
  return revList;
}

If you analyze each element, you should see how amazon further subdivides the information held including the title of the review, the date of the review and the body of the text it holds. 如果分析每个元素,您应该看到亚马逊如何进一步细分所保存的信息,包括评论的标题,评论的日期和其正文的内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM