简体   繁体   中英

find a specific word in a website using asp.net C#

How can I find a specific word in a website using asp.net C#. example, if I in browse cnn.com website and I want to browse a word such as sport in the website, how can i find it using the asp.net C#.

thanks

You can get the webpage as a string with this code

string webpageData;
using (System.Net.WebClient webClient = new System.Net.WebClient())
    webpageData = webClient.DownloadString("http://www.cnn.com");

Then just use regular string methods

var containsWord = webpageData.Contains("word");

if I understood your question correctly you want to be able to programmatically browse a website and find positions of given words. In order to do this you can use WebClient class in order to load the html content of the page then Regex to match needed words. Below is an example which would load cnn.com and list all the links found on this website and their positions, you can modify the regualr expression to return only links which contain word sport

WebClient client = new WebClient();
using (Stream data = client.OpenRead(@"http://www.cnn.com/"))
{
    using (StreamReader reader = new StreamReader(data))
    {
        string content = reader.ReadToEnd();
        string pattern = @"((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\))+[\w\d:#@%/;$()~_?\+-=\\\.&]*)";
        MatchCollection matches = Regex.Matches(content, pattern);
        foreach (Match match in matches)
        {
            GroupCollection groups = match.Groups;
            Console.WriteLine("'{0}' repeated at position {1}",
                              groups[0].Value, groups[0].Index);
        }
    }
}

You could build a sort of "crawler" in c# that captures the home page and then recurses links on the page. The crawler would bring down the html source for each page, which you could do a simple text search. It feels pretty barbaric to describe, but it could work.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM