Java Web Crawler，用于检索Google搜索结果

Question

这个问题已经被问过很多次了。 但是，某些API随时间变化了，我想知道一种实现此问题的好方法。

最好的方法是使用Google搜索API。 但是， https：//developers.google.com/custom-search/json-api/v1/overview告诉我们每天只有100个免费搜索查询。 我会要求更多，而且我不想花钱去做。

我使用简单的REST api进行了尝试，但是它主要是JavaScript代码，而且我似乎在响应中找不到我需要的东西。

我尝试使用一些库，例如http://jsoup.org/ ，但是即使它的响应也不包含我需要的信息。

Answer 1

请参阅以下Jsoup爬网程序示例： http ://www.mkyong.com/java/jsoup-send-search-query-to-google/

在Java中，我使用crawler4j： https : //code.google.com/p/crawler4j/

Answer 2

我尝试使用Jsoup并成功了，尽管前几个结果包括一些不需要的字符。 下面是我的代码

package crawl_google;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class googleResults {
public static void main(String[] args) throws Exception{
//pass the search query and the number of results as parameters
google_results("Natural Language Processing", 10);
}
public static void google_results(String keyword, int no_of_results) throws Exception
{
//Replace space by + in the keyword as in the google search url
keyword = keyword.replace(" ", "+");
String url = "https://www.google.com/search?q=" + keyword + "&num=" + String.valueOf(no_of_results);
//Connect to the url and obain HTML response
Document doc = Jsoup
.connect(url)
.userAgent("Mozilla")
.timeout(5000).get();
//parsing HTML after examining DOM
Elements els = doc.select("li.g");
for(Element el : els)
{
//Print title, site and abstract
System.out.println("Title : " + el.getElementsByTag("h3").text());
System.out.println("Site : " + el.getElementsByTag("cite").text());
System.out.println("Abstract : " + el.getElementsByTag("span").text() + "\n");
}
}
}

Java Web Crawler，用于检索Google搜索结果

问题描述

2 个解决方案

解决方案1
1 2014-12-27 17:04:48

解决方案2
1 已采纳 2015-01-01 05:57:24

Java Web Crawler，用于检索Google搜索结果

问题描述

2 个解决方案

解决方案1 1 2014-12-27 17:04:48

解决方案2 1 已采纳 2015-01-01 05:57:24

解决方案1
1 2014-12-27 17:04:48

解决方案2
1 已采纳 2015-01-01 05:57:24