简体   繁体   English

在 html 响应中搜索特定文本 (ASP.NET Core)

[英]Search for specific text in html response (ASP.NET Core)

I need to search for specific word in html we page.我需要在我们页面的 html 中搜索特定单词。

I try to do this using c# (asp.net core)我尝试使用 c# (asp.net core) 来做到这一点

My point is to get url and word for search from View via js and than in response if word is exist show it , if not, show smth我的观点是通过 js 从 View 中获取 url 和 word 进行搜索,如果单词存在则作为响应显示它,如果不存在,则显示 smth

I make method for getting html code of page.我制作了获取页面html代码的方法。 Here is code这是代码

 [HttpPost]
    public JsonResult SearchWord([FromBody] RequestModel model){


        HttpWebRequest request = (HttpWebRequest)WebRequest.Create(model.adress);
        HttpWebResponse response = (HttpWebResponse)request.GetResponse();
            Stream receiveStream = response.GetResponseStream();
            StreamReader readStream = null;

            if (response.CharacterSet == null)
            {
                readStream = new StreamReader(receiveStream);
            }
            else
            {
                readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
            }

            string data = readStream.ReadToEnd();
            string strRegex = model.word;

            response.Close();
            readStream.Close();
            return Json(data);
    }

But, how I need to search for word correctly?但是,我需要如何正确搜索单词?

You will not be able to do much with simple pattern matching, check out this SO classic - RegEx match open tags except XHTML self-contained tags .您将无法使用简单的模式匹配做很多事情,请查看这个非常经典的 - RegEx match open tags except XHTML self-contained tags Consider using some web scraping library like html-agility-pack if you want to do some serious scraping.如果你想做一些严肃的抓取,可以考虑使用一些网页抓取库,比如html-agility-pack If you want to only search for the single word in a web-page, no matter whether it's a markup or CDATA etc., just join all the chars in an array and use string.Contains, or Regex.如果您只想搜索网页中的单个单词,无论是标记还是 CDATA 等,只需将所有字符连接到一个数组中并使用 string.Contains 或 Regex。

To add to the previous answer you can use Regex.Match .要添加到上一个答案,您可以使用Regex.Match Something like:就像是:

string pattern = @"(\w+)\s+(strRegex)";

// Instantiate the regular expression object.
Regex r = new Regex(pattern, RegexOptions.IgnoreCase);

// Match the regular expression pattern against your html data.
Match m = r.Match(data);

if (m.Success) {
    //Add your logic here
}

NOTE : There are quite a few things you can do to optimize your code, specifically looking at how you are handling stream reader.注意:您可以做很多事情来优化您的代码,特别是查看您如何处理流阅读器。 I would read in chunks and try and match the chunk.我会分块阅读并尝试匹配块。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM