如何在Windows Form C＃中获取与该关键字相关的所有网站

Question

Here is my process: I have a textbox where user will enter the keyword, for example games , then after enter all the websites related to games will be output in the windows form. 这是我的过程：我有一个textbox ，用户将在其中输入关键字，例如游戏，然后在输入所有与游戏相关的网站后，将在Windows窗体中输出该textbox 。

Basically I tried using the Google Search API, using this code: 基本上，我尝试使用Google Search API，并使用以下代码：

const string apiKey = "";
const string searchEngineId = "";
const string query = "games";
CustomsearchService customSearchService = new CustomsearchService(new Google.Apis.Services.BaseClientService.Initializer() { ApiKey = apiKey });
Google.Apis.Customsearch.v1.CseResource.ListRequest listRequest = customSearchService.Cse.List(query);
listRequest.Cx = searchEngineId; 
Search search = listRequest.Execute();
foreach (var item in search.Items)
{
    Console.WriteLine("Title : " + item.Title + Environment.NewLine + "Link : " + item.Link + Environment.NewLine + Environment.NewLine);
}

But my problem is that the limitation of 100 query/day and 10 results/query is not applicable. 但是我的问题是，每天100个查询和10个结果/查询的限制不适用。

So I decided to use HttpWebRequest and HttpWebResponse approach, Here is the code which I saw from the internet: 因此，我决定使用HttpWebRequest和HttpWebResponse方法，这是我从互联网上看到的代码：

StringBuilder sb = new StringBuilder();

// used on each read operation
byte[] buf = new byte[8192];
string GS = "http://google.com/search?q=sample";
// prepare the web page we will be asking for
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(GS);

// execute the request
HttpWebResponse response = (HttpWebResponse)request.GetResponse();

// we will read data via the response stream
Stream resStream = response.GetResponseStream();
string tempString = null;
int count = 0;
do
{
    // fill the buffer with data
    count = resStream.Read(buf, 0, buf.Length);
    // make sure we read some data
    if (count != 0)
    {
        // translate from bytes to ASCII text
        tempString = Encoding.ASCII.GetString(buf, 0, count);

        // continue building the string
        sb.Append(tempString);
    }
}
while (count > 0);

My problem with this is that it returns the whole HTML, Is it possible to get only the URL like using the Google Search API 我的问题是它会返回整个HTML，是否可能仅使用Google Search API来获取URL

Answer 1

这就是它的工作方式，您要么必须付费购买API，要么解析HTML-后者的合法性令人怀疑。

Answer 2

Using a html parser with css selectors, it is not that much work (solution is based on this java tutorial: http://mph-web.de/web-scraping-with-java-top-10-google-search-results/ ). 使用带有CSS选择器的html解析器，工作量不大（解决方案基于以下Java教程： http ： //mph-web.de/web-scraping-with-java-top-10-google-search-results / ）。 I used Dcsoup ( https://github.com/matarillo/dcsoup incomplete Jsoup port) for the example, since I'm used to Jsoup ( https://jsoup.org/apidocs/ ), but there might be other html parsers for c# that are better maintained, etc. 我以Dcsoup（ https://github.com/matarillo/dcsoup不完整的Jsoup端口）为例，因为我已经习惯了Jsoup（ https://jsoup.org/apidocs/ ），但是可能还有其他html解析器对于更好维护的C＃等

// query results on page 14, to demonstrate that limit of results is avoided
int resultPage = 130;
string keyword = "test";
string searchUrl = "http://www.google.com/search?q="+keyword+"&start="+resultPage;

System.Net.WebClient webClient = new System.Net.WebClient();
string htmlResult = webClient.DownloadString(searchUrl);

Supremes.Nodes.Document doc = Supremes.Dcsoup.Parse(htmlResult, "http://www.google.com/");

// parse with css selector
foreach (Supremes.Nodes.Element result in doc.Select("h3.r a")) 
{
    string title = result.Text;
    string url = result.Attr("href");

    // do something useful with the search result
    System.Diagnostics.Debug.WriteLine(title + " -> " + url);
}

The needed selector h3.ra might change. 所需的选择器h3.ra可能会更改。 A more stable alternative might be to parse all elements an retrieve those with href attribute or at least have a built-in check (check for a search term with a lot of results and parse and if there are no results for your selector, send you a notify, to repair the selector). 一种更稳定的选择是解析所有元素，然后检索具有href属性或至少具有内置检查的元素（检查包含大量结果的搜索字词并进行分析，如果选择器没有结果，请发送给您通知，以修复选择器）。

See also this answer regarding getting the results for the exact search term: https://stackoverflow.com/a/37268746/1661938 另请参阅以下有关获得确切搜索词结果的答案： https : //stackoverflow.com/a/37268746/1661938

如何在Windows Form C＃中获取与该关键字相关的所有网站

问题描述

2 个解决方案

解决方案1
2 已采纳 2016-06-08 07:43:40

解决方案2
0 2016-06-08 09:07:38

如何在Windows Form C＃中获取与该关键字相关的所有网站

问题描述

2 个解决方案

解决方案1 2 已采纳 2016-06-08 07:43:40

解决方案2 0 2016-06-08 09:07:38

解决方案1
2 已采纳 2016-06-08 07:43:40

解决方案2
0 2016-06-08 09:07:38