[英]Search for specific text in html response (ASP.NET Core)
I need to search for specific word in html we page.我需要在我们页面的 html 中搜索特定单词。
I try to do this using c# (asp.net core)我尝试使用 c# (asp.net core) 来做到这一点
My point is to get url and word for search from View via js and than in response if word is exist show it , if not, show smth我的观点是通过 js 从 View 中获取 url 和 word 进行搜索,如果单词存在则作为响应显示它,如果不存在,则显示 smth
I make method for getting html code of page.我制作了获取页面html代码的方法。 Here is code
这是代码
[HttpPost]
public JsonResult SearchWord([FromBody] RequestModel model){
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(model.adress);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream receiveStream = response.GetResponseStream();
StreamReader readStream = null;
if (response.CharacterSet == null)
{
readStream = new StreamReader(receiveStream);
}
else
{
readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
}
string data = readStream.ReadToEnd();
string strRegex = model.word;
response.Close();
readStream.Close();
return Json(data);
}
But, how I need to search for word correctly?但是,我需要如何正确搜索单词?
You will not be able to do much with simple pattern matching, check out this SO classic - RegEx match open tags except XHTML self-contained tags .您将无法使用简单的模式匹配做很多事情,请查看这个非常经典的 - RegEx match open tags except XHTML self-contained tags 。 Consider using some web scraping library like html-agility-pack if you want to do some serious scraping.
如果你想做一些严肃的抓取,可以考虑使用一些网页抓取库,比如html-agility-pack 。 If you want to only search for the single word in a web-page, no matter whether it's a markup or CDATA etc., just join all the chars in an array and use string.Contains, or Regex.
如果您只想搜索网页中的单个单词,无论是标记还是 CDATA 等,只需将所有字符连接到一个数组中并使用 string.Contains 或 Regex。
To add to the previous answer you can use Regex.Match .要添加到上一个答案,您可以使用Regex.Match 。 Something like:
就像是:
string pattern = @"(\w+)\s+(strRegex)";
// Instantiate the regular expression object.
Regex r = new Regex(pattern, RegexOptions.IgnoreCase);
// Match the regular expression pattern against your html data.
Match m = r.Match(data);
if (m.Success) {
//Add your logic here
}
NOTE : There are quite a few things you can do to optimize your code, specifically looking at how you are handling stream reader.注意:您可以做很多事情来优化您的代码,特别是查看您如何处理流阅读器。 I would read in chunks and try and match the chunk.
我会分块阅读并尝试匹配块。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.