简体   繁体   中英

How to Get all the websites related to the keyword in Windows Form C#

Here is my process: I have a textbox where user will enter the keyword, for example games , then after enter all the websites related to games will be output in the windows form.

Basically I tried using the Google Search API, using this code:

const string apiKey = "";
const string searchEngineId = "";
const string query = "games";
CustomsearchService customSearchService = new CustomsearchService(new Google.Apis.Services.BaseClientService.Initializer() { ApiKey = apiKey });
Google.Apis.Customsearch.v1.CseResource.ListRequest listRequest = customSearchService.Cse.List(query);
listRequest.Cx = searchEngineId; 
Search search = listRequest.Execute();
foreach (var item in search.Items)
{
    Console.WriteLine("Title : " + item.Title + Environment.NewLine + "Link : " + item.Link + Environment.NewLine + Environment.NewLine);
}

But my problem is that the limitation of 100 query/day and 10 results/query is not applicable.

So I decided to use HttpWebRequest and HttpWebResponse approach, Here is the code which I saw from the internet:

StringBuilder sb = new StringBuilder();

// used on each read operation
byte[] buf = new byte[8192];
string GS = "http://google.com/search?q=sample";
// prepare the web page we will be asking for
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(GS);

// execute the request
HttpWebResponse response = (HttpWebResponse)request.GetResponse();

// we will read data via the response stream
Stream resStream = response.GetResponseStream();
string tempString = null;
int count = 0;
do
{
    // fill the buffer with data
    count = resStream.Read(buf, 0, buf.Length);
    // make sure we read some data
    if (count != 0)
    {
        // translate from bytes to ASCII text
        tempString = Encoding.ASCII.GetString(buf, 0, count);

        // continue building the string
        sb.Append(tempString);
    }
}
while (count > 0);

My problem with this is that it returns the whole HTML, Is it possible to get only the URL like using the Google Search API

这就是它的工作方式,您要么必须付费购买API,要么解析HTML-后者的合法性令人怀疑。

Using a html parser with css selectors, it is not that much work (solution is based on this java tutorial: http://mph-web.de/web-scraping-with-java-top-10-google-search-results/ ). I used Dcsoup ( https://github.com/matarillo/dcsoup incomplete Jsoup port) for the example, since I'm used to Jsoup ( https://jsoup.org/apidocs/ ), but there might be other html parsers for c# that are better maintained, etc.

// query results on page 14, to demonstrate that limit of results is avoided
int resultPage = 130;
string keyword = "test";
string searchUrl = "http://www.google.com/search?q="+keyword+"&start="+resultPage;

System.Net.WebClient webClient = new System.Net.WebClient();
string htmlResult = webClient.DownloadString(searchUrl);

Supremes.Nodes.Document doc = Supremes.Dcsoup.Parse(htmlResult, "http://www.google.com/");

// parse with css selector
foreach (Supremes.Nodes.Element result in doc.Select("h3.r a")) 
{
    string title = result.Text;
    string url = result.Attr("href");

    // do something useful with the search result
    System.Diagnostics.Debug.WriteLine(title + " -> " + url);
}

The needed selector h3.ra might change. A more stable alternative might be to parse all elements an retrieve those with href attribute or at least have a built-in check (check for a search term with a lot of results and parse and if there are no results for your selector, send you a notify, to repair the selector).

See also this answer regarding getting the results for the exact search term: https://stackoverflow.com/a/37268746/1661938

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM