[英]get links from search engines in c#
first of all excuse me for my broken english 首先,请原谅我英语不好
i want to code a metasearch engine first i try to use google bing and yahoo api s but theye were limited 我想先编写一个metasearch引擎代码,然后尝试使用google bing和yahoo api s,但是它们有限
then i'm trying to use htmlagility pack to gain results link of search engines 然后我正在尝试使用htmlagility包来获取搜索引擎的结果链接
i have this code 我有这个代码
using HtmlAgilityPack;
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.IO;
using System.Linq;
using System.Net;
using System.ServiceModel.Syndication;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using System.Xml;
namespace Search
{
public partial class Form1 : Form
{
// load snippet
HtmlAgilityPack.HtmlDocument htmlSnippet = new HtmlAgilityPack.HtmlDocument();
public Form1()
{
InitializeComponent();
}
private void btn1_Click(object sender, EventArgs e)
{
listBox1.Items.Clear();
StringBuilder sb = new StringBuilder();
byte[] ResultsBuffer = new byte[8192];
string SearchResults = "http://google.com/search?q=" + txtKeyWords.Text.Trim();
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(SearchResults);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream resStream = response.GetResponseStream();
string tempString = null;
int count = 0;
do
{
count = resStream.Read(ResultsBuffer, 0, ResultsBuffer.Length);
if (count != 0)
{
tempString = Encoding.ASCII.GetString(ResultsBuffer, 0, count);
sb.Append(tempString);
}
}
while (count > 0);
string sbb = sb.ToString();
HtmlAgilityPack.HtmlDocument html = new HtmlAgilityPack.HtmlDocument();
html.OptionOutputAsXml = true;
html.LoadHtml(sbb);
HtmlNode doc = html.DocumentNode;
foreach (HtmlNode link in doc.SelectNodes("//a[@href]"))
{
//HtmlAttribute att = link.Attributes["href"];
string hrefValue = link.GetAttributeValue("href", string.Empty);
if (!hrefValue.ToString().ToUpper().Contains("GOOGLE") && hrefValue.ToString().Contains("/url?q=") && hrefValue.ToString().ToUpper().Contains("HTTP://"))
{
int index = hrefValue.IndexOf("&");
if (index > 0)
{
hrefValue = hrefValue.Substring(0, index);
listBox1.Items.Add(hrefValue.Replace("/url?q=", ""));
}
}
}
}
}
} }
can i use this code for all search engines? 我可以在所有搜索引擎中使用此代码吗? i changed these lines so it work for other search engines
我更改了这些行,以使其适用于其他搜索引擎
if (!hrefValue.ToString().ToUpper().Contains("YAHOO") && hrefValue.ToString().Contains("/url?q=") && hrefValue.ToString().ToUpper().Contains("HTTP://"))
and 和
string SearchResults = "http://yahoo.com/search?q=" + textBox1.Text.Trim();
but it dosent work 但它确实起作用
My other problem is that this code just return the first page links .what should i do if i want to return N first link? 我的另一个问题是该代码仅返回首页链接。如果我想返回N个首页链接,该怎么办?
anybody can help? 有人可以帮忙吗?
First of all you have more than one question in this topic. 首先,在此主题中,您有多个问题。 Please write a topic for each question.
请为每个问题写一个主题。
In the case of Yahoo, " http://yahoo.com/search?q= " is not valid, if you try http://yahoo.com/search?q=stackoverflow you don't get the result page. 对于Yahoo,“ http://yahoo.com/search?q= ”无效,如果尝试使用http://yahoo.com/search?q=stackoverflow ,则不会显示结果页面。 You have to find the search url for every search engine.
您必须找到每个搜索引擎的搜索URL。 For example Yahoo has: https://search.yahoo.com/search?p= .
例如,雅虎有: https : //search.yahoo.com/search?p= 。
You also have to modify this if (!hrefValue.ToString().ToUpper().Contains("YAHOO") && hrefValue.ToString().Contains("/url?q=") && hrefValue.ToString().ToUpper().Contains("HTTP://"))
for every search engine. if (!hrefValue.ToString().ToUpper().Contains("YAHOO") && hrefValue.ToString().Contains("/url?q=") && hrefValue.ToString().ToUpper().Contains("HTTP://"))
每个搜索引擎的if (!hrefValue.ToString().ToUpper().Contains("YAHOO") && hrefValue.ToString().Contains("/url?q=") && hrefValue.ToString().ToUpper().Contains("HTTP://"))
。 For example you only get HTTP values, however HTTPS are discard. 例如,您仅获得HTTP值,但是HTTPS被丢弃。
Google use &start= for pagination and usually returns 10 results per page. Google使用&start =进行分页,通常每页返回10个结果。 So if you put start=20, you get from 20 to 30 https://www.google.es/search?q=stackoverflow&start=20
因此,如果您将start = 20设置为20,则您将获得20到30 https://www.google.es/search?q=stackoverflow&start=20
Yahoo also returns 10 results per page and use por pagination &b= . 雅虎每页还返回10个结果,并使用分页&b = 。 b=1 is the first page, b=11 de second and so on.
b = 1是第一页,b = 11是第二页,依此类推。 Example: https://search.yahoo.com/search?p=stackoverflow&b=11
示例: https : //search.yahoo.com/search?p = stackoverflow&b = 11
I hope this can help you. 希望对您有所帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.