简体   繁体   English

从生成的数据提取<script> and process the results

[英]Data extraction from a generated <script> and process the results

在此处输入图片说明

 string Url= "https://www.audiusa.com/dealers-webapp/map/dealer/423E99";   
     HtmlWeb web = new HtmlWeb();
                            ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
                            HtmlDocument doc = web.Load(Url);
     var scriptGoogleTagManager = doc.DocumentNode.SelectNodes("//script").Where(x => x.InnerHtml.Contains("window.Audi.Vars.searchType"));
                            if (scriptGoogleTagManager )
                            {
                                foreach(var tag in scriptGoogleTagManager)
                                {
                                    var s = tag.InnerText;
                                    Regex r = new Regex("\\s+window\\.Audi\\.Vars\\.searchResult\\s+\\=\\s+");
                                    Match m = r.Match(s.ToLower());
                                }
                            }

In above script I want to extract values after window.Audi.Vars.searchResult = and window.Audi.Vars.dealers = .I am facing problem in regex as I dont have much knowledge of it .Kindly help me在上面的脚本中,我想在 window.Audi.Vars.searchResult = 和 window.Audi.Vars.dealers = 之后提取值。我在正则表达式中遇到问题,因为我对它没有太多了解。请帮助我

I understand you want to get rid of eg我知道你想摆脱例如
window.Audi.Vars.searchResult = window.Audi.Vars.searchResult =

var extract = s.slice(31); // since the string "window.Audi.Vars.searchResult =" has 31 chars

The slice() method extracts parts of a string and returns the extracted parts in a new string. slice() 方法提取字符串的一部分并在新字符串中返回提取的部分。 Use the start and end parameters to specify the part of the string you want to extract.使用 start 和 end 参数指定要提取的字符串部分。 Here we only give the start param and it extracts to the end.这里我们只给出开始参数,它提取到最后。 The first character has the position 0, the second has position 1, and so on.第一个字符的位置为 0,第二个字符的位置为 1,依此类推。 >br> Regex is imho good when relacing, removing chars in a string here a simpler method works. >br> 恕我直言,正则表达式很好,在这里删除字符串中的字符是一种更简单的方法。

Modify your code and post the console result:修改您的代码并发布控制台结果:

 var scriptGoogleTagManager = doc.DocumentNode.SelectNodes("//script").Where(x => x.InnerHtml.Contains("window.Audi.Vars.searchType"));
                    if (scriptGoogleTagManager )
                    {
                        foreach(var tag in scriptGoogleTagManager)
                        {
                            var s = tag.InnerText;
                            console.debug("[content of s] " + s); 
                            var extract = s.slice(31); // since the string
                        }
                    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM