[英]extract comments from blog using jsoup
please help on extracting comments from a blog like blogger using jsoup 请帮助我们使用jsoup从博客等博客中提取评论
am able to get the title but how to extract all the comments posted by people on a certain topic of discussion 能够获得标题,但如何提取人们对某个讨论主题发表的所有评论
package com.hascode.samples.jsoup;
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
public class WebScraper {
public static void main(final String[] args) throws IOException {
Document doc = Jsoup.connect("http://www.hascode.com/")
.userAgent("Mozilla").timeout(6000).get();
String title = doc.title(); // parsing the page's title
System.out.println("The title of www.hascode.com is: " + title);
Elements heading = doc.select("h2 > a"); // parsing the latest article's
// heading
System.out.println("The latest article is: " + heading.text());
System.out.println("The article's URL is: " + heading.attr("href"));
Elements editorial = doc.select("div.BlockContent-body small");
System.out.println("The was created: " + editorial.text());
}
}
I am trying to extract comments using Jframe, but there is no output. 我正在尝试使用Jframe提取注释,但是没有输出。 Here is my code:
这是我的代码:
public class SimpleWebCrawler extends JFrame {
JTextField yourInputField = new JTextField(20);
static JTextArea _resultArea = new JTextArea(100, 100);
JScrollPane scrollingArea = new JScrollPane(_resultArea);
private final static String newline = "\n";
public SimpleWebCrawler() throws MalformedURLException {
_resultArea.setEditable(false);
System.out.println("Please enter the website :");
Scanner scan2 = new Scanner(System.in);
String word2 = scan2.nextLine();
try {
URL my_url = new URL("http://" + word2 + "/");
BufferedReader br = new BufferedReader(new InputStreamReader(
my_url.openStream()));
String strTemp = "";
while (null != (strTemp = br.readLine())) {
_resultArea.append(strTemp + newline);
}
} catch (Exception ex) {
ex.printStackTrace();
}
_resultArea.append("\n");
_resultArea.append("\n");
_resultArea.append("\n");
String url = "http://" + word2 + "/";
print("Fetching %s...", url);
try{
Document articlePage = Jsoup.connect(url).get();
Elements comments = articlePage.select(".comments .comment-body");
System.out.println("\n");
BufferedWriter bw = new BufferedWriter(new FileWriter("C:\\Users\\user\\fypworkspace\\FYP\\Link\\abc.txt"));
_resultArea.append("\n");
for (Element comment : comments) {
print(" %s ", comment.text());
bw.write(comment.text());
bw.write(System.getProperty("line.separator"));
}
bw.flush();
bw.close();
} catch (IOException e1) {
}
JPanel content = new JPanel();
content.setLayout(new BorderLayout());
content.add(scrollingArea, BorderLayout.CENTER);
this.setContentPane(content);
this.setTitle("Crawled Links");
this.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
this.pack();
}
private static void print(String msg, Object... args) {
_resultArea.append(String.format(msg, args) +newline);
}
private static String trim(String s, int width) {
if (s.length() > width)
return s.substring(0, width - 1) + ".";
else
return s;
}
//.. Get the content pane, set layout, add to center
public static void main(String[] args) throws IOException {
JFrame win = new SimpleWebCrawler();
win.setVisible(true);
}
}
Just open the article page and scrape comments from there. 只需打开文章页面并从那里抓取评论。 Each comment is one
<li>
element in <ul>
with class commentsList
, so you can get them all like this: 每个注释都是
<ul>
一个<li>
元素,带有类的commentsList
,所以你可以这样得到它们:
Document articlePage = Jsoup.connect(heading.attr("href")).get();
Elements comments = articlePage.select(".commentsList li");
for (Element comment : comments) {
System.out.println("Comment: " + comment.text());
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.