簡體   English   中英

使用JSoup的Google搜索

[英]Google Search with JSoup

我試圖用JSoup在google中搜索。 我的問題是,當我開始搜索時,變量查詢未顯示我想要的URL。 另外,Jsoup如何搜索? 尋找標題或URL還是什么?

公開課開始{

public static void main(String[] args) {
    try {
        new Google().Searching("Möbel Beck GmbH & Co.KG");
    } catch (Exception e) {
        System.out.println(e.getMessage());
    }
}

}

public class Google implements Serializable {

private static final long serialVersionUID = 1L;

private static Pattern patternDomainName;
private Matcher matcher;
private static final String DOMAIN_NAME_PATTERN = "([a-zA-Z0-9]([a-zA-Z0-9\\-]{0,61}[a-zA-Z0-9])?\\.)+[a-zA-Z]{2,6}";
static {
    patternDomainName = Pattern.compile(DOMAIN_NAME_PATTERN);
}

public void Searching(String searchstring) throws IOException {

    Google obj = new Google();
    Set<String> result = obj.getDataFromGoogle(searchstring);
    for (String temp : result) {

        if (temp.contains(searchstring)) {
            System.out.println(temp + " ----> CONTAINS");
        } else {
            System.out.println(temp);
        }
    }
    System.out.println(result.size());

}

public String getDomainName(String url) {

    String domainName = "";
    matcher = patternDomainName.matcher(url);
    if (matcher.find()) {
        domainName = matcher.group(0).toLowerCase().trim();
    }
    return domainName;

}

private Set<String> getDataFromGoogle(String query) {

    Set<String> result = new HashSet<String>();
    String request = "https://www.google.com/search?q=" + query;
    System.out.println("Sending request..." + request);

    try {

        // need http protocol, set this as a Google bot agent :)
        Document doc = Jsoup.connect(request)
                .userAgent("Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)").timeout(6000)
                .get();

        // get all links
        Elements links = doc.select("a[href]");
        for (Element link : links) {

            String temp = link.attr("href");
            if (temp.startsWith("/url?q=")) {
                // use regex to get domain name
                result.add(getDomainName(temp));
            }

        }

    } catch (IOException e) {
        e.printStackTrace();
    }

    return result;
}

}

直接解析Google網站不是一個好主意。 您可以嘗試使用Google API https://developers.google.com/web-search/docs/#java-access

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM