简体   繁体   中英

Search for specific text in html response (ASP.NET Core)

I need to search for specific word in html we page.

I try to do this using c# (asp.net core)

My point is to get url and word for search from View via js and than in response if word is exist show it , if not, show smth

I make method for getting html code of page. Here is code

 [HttpPost]
    public JsonResult SearchWord([FromBody] RequestModel model){


        HttpWebRequest request = (HttpWebRequest)WebRequest.Create(model.adress);
        HttpWebResponse response = (HttpWebResponse)request.GetResponse();
            Stream receiveStream = response.GetResponseStream();
            StreamReader readStream = null;

            if (response.CharacterSet == null)
            {
                readStream = new StreamReader(receiveStream);
            }
            else
            {
                readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
            }

            string data = readStream.ReadToEnd();
            string strRegex = model.word;

            response.Close();
            readStream.Close();
            return Json(data);
    }

But, how I need to search for word correctly?

You will not be able to do much with simple pattern matching, check out this SO classic - RegEx match open tags except XHTML self-contained tags . Consider using some web scraping library like html-agility-pack if you want to do some serious scraping. If you want to only search for the single word in a web-page, no matter whether it's a markup or CDATA etc., just join all the chars in an array and use string.Contains, or Regex.

To add to the previous answer you can use Regex.Match . Something like:

string pattern = @"(\w+)\s+(strRegex)";

// Instantiate the regular expression object.
Regex r = new Regex(pattern, RegexOptions.IgnoreCase);

// Match the regular expression pattern against your html data.
Match m = r.Match(data);

if (m.Success) {
    //Add your logic here
}

NOTE : There are quite a few things you can do to optimize your code, specifically looking at how you are handling stream reader. I would read in chunks and try and match the chunk.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM