I need to search for specific word in html we page.
I try to do this using c# (asp.net core)
My point is to get url and word for search from View via js and than in response if word is exist show it , if not, show smth
I make method for getting html code of page. Here is code
[HttpPost]
public JsonResult SearchWord([FromBody] RequestModel model){
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(model.adress);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream receiveStream = response.GetResponseStream();
StreamReader readStream = null;
if (response.CharacterSet == null)
{
readStream = new StreamReader(receiveStream);
}
else
{
readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
}
string data = readStream.ReadToEnd();
string strRegex = model.word;
response.Close();
readStream.Close();
return Json(data);
}
But, how I need to search for word correctly?
You will not be able to do much with simple pattern matching, check out this SO classic - RegEx match open tags except XHTML self-contained tags . Consider using some web scraping library like html-agility-pack if you want to do some serious scraping. If you want to only search for the single word in a web-page, no matter whether it's a markup or CDATA etc., just join all the chars in an array and use string.Contains, or Regex.
To add to the previous answer you can use Regex.Match . Something like:
string pattern = @"(\w+)\s+(strRegex)";
// Instantiate the regular expression object.
Regex r = new Regex(pattern, RegexOptions.IgnoreCase);
// Match the regular expression pattern against your html data.
Match m = r.Match(data);
if (m.Success) {
//Add your logic here
}
NOTE : There are quite a few things you can do to optimize your code, specifically looking at how you are handling stream reader. I would read in chunks and try and match the chunk.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.