从C＃中的搜索引擎获取链接

Question

首先，请原谅我英语不好
我想先编写一个metasearch引擎代码，然后尝试使用google bing和yahoo api s，但是它们有限
然后我正在尝试使用htmlagility包来获取搜索引擎的结果链接
我有这个代码

using HtmlAgilityPack;
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.IO;
using System.Linq;
using System.Net;
using System.ServiceModel.Syndication;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using System.Xml;

namespace Search
{
public partial class Form1 : Form
{
    // load snippet
    HtmlAgilityPack.HtmlDocument htmlSnippet = new HtmlAgilityPack.HtmlDocument();

    public Form1()
    {
        InitializeComponent();
    }

    private void btn1_Click(object sender, EventArgs e)
    {
        listBox1.Items.Clear();
        StringBuilder sb = new StringBuilder();
        byte[] ResultsBuffer = new byte[8192];
        string SearchResults = "http://google.com/search?q=" + txtKeyWords.Text.Trim();
        HttpWebRequest request = (HttpWebRequest)WebRequest.Create(SearchResults);
        HttpWebResponse response = (HttpWebResponse)request.GetResponse();

        Stream resStream = response.GetResponseStream();
        string tempString = null;
        int count = 0;
        do
        {
            count = resStream.Read(ResultsBuffer, 0, ResultsBuffer.Length);
            if (count != 0)
            {
                tempString = Encoding.ASCII.GetString(ResultsBuffer, 0, count);
                sb.Append(tempString);
            }
        }

        while (count > 0);
        string sbb = sb.ToString();

        HtmlAgilityPack.HtmlDocument html = new HtmlAgilityPack.HtmlDocument();
        html.OptionOutputAsXml = true;
        html.LoadHtml(sbb);
        HtmlNode doc = html.DocumentNode;

        foreach (HtmlNode link in doc.SelectNodes("//a[@href]"))
        {
            //HtmlAttribute att = link.Attributes["href"];
            string hrefValue = link.GetAttributeValue("href", string.Empty);
            if (!hrefValue.ToString().ToUpper().Contains("GOOGLE") && hrefValue.ToString().Contains("/url?q=") && hrefValue.ToString().ToUpper().Contains("HTTP://"))
            {
                int index = hrefValue.IndexOf("&");
                if (index > 0)
                {
                    hrefValue = hrefValue.Substring(0, index);
                    listBox1.Items.Add(hrefValue.Replace("/url?q=", ""));
                }
            }
        }
    }
}

}

我可以在所有搜索引擎中使用此代码吗？ 我更改了这些行，以使其适用于其他搜索引擎

if (!hrefValue.ToString().ToUpper().Contains("YAHOO") && hrefValue.ToString().Contains("/url?q=") && hrefValue.ToString().ToUpper().Contains("HTTP://"))

和

string SearchResults = "http://yahoo.com/search?q=" + textBox1.Text.Trim();

但它确实起作用

我的另一个问题是该代码仅返回首页链接。如果我想返回N个首页链接，该怎么办？
有人可以帮忙吗？

Answer 1

首先，在此主题中，您有多个问题。 请为每个问题写一个主题。

对于Yahoo，“ http://yahoo.com/search?q= ”无效，如果尝试使用http://yahoo.com/search?q=stackoverflow ，则不会显示结果页面。 您必须找到每个搜索引擎的搜索URL。 例如，雅虎有： https : //search.yahoo.com/search?p= 。

if (!hrefValue.ToString().ToUpper().Contains("YAHOO") && hrefValue.ToString().Contains("/url?q=") && hrefValue.ToString().ToUpper().Contains("HTTP://"))每个搜索引擎的if (!hrefValue.ToString().ToUpper().Contains("YAHOO") && hrefValue.ToString().Contains("/url?q=") && hrefValue.ToString().ToUpper().Contains("HTTP://")) 。 例如，您仅获得HTTP值，但是HTTPS被丢弃。

分页

Google使用＆start =进行分页，通常每页返回10个结果。 因此，如果您将start = 20设置为20，则您将获得20到30 https://www.google.es/search?q=stackoverflow&start=20

雅虎每页还返回10个结果，并使用分页＆b = 。 b = 1是第一页，b = 11是第二页，依此类推。 示例： https ： //search.yahoo.com/search？p = stackoverflow＆b = 11

希望对您有所帮助。

从C＃中的搜索引擎获取链接

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-05-16 10:20:45

分页

从C＃中的搜索引擎获取链接

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-05-16 10:20:45

分页

解决方案1
1 已采纳 2016-05-16 10:20:45