简体   繁体   English

C#尝试使用XmlNode阅读页面

[英]C# Trying to read a page using XmlNode

So I am trying to read the Steam store page from the lowest price to the highest. 因此,我正在尝试从最低价格到最高价格阅读Steam商店页面。 I have the URL needed and I have written some code which have worked in the past but does not work anymore. 我有所需的URL,并且编写了一些过去可以使用但不再可用的代码。 I have spend some days trying to fix this problem but I just can't seem to find the problem. 我花了几天的时间来解决此问题,但似乎无法找到问题。

Link I am trying to read. 链接我正在尝试阅读。

Here is the code. 这是代码。

    //List of items from the Steam market from lowest to highest
    private void priceFromMarket(int StartPage)
    {
        if (valueList.Count != 0)
        {
            valueList.Clear();
            numList.Clear();
            nameList.Clear();
        }
        string pageContent = null;
        string results_html = null;
        try
        {
            HttpWebRequest myReq = (HttpWebRequest)WebRequest.Create("http://steamcommunity.com/market/search/render/?query=appid:730&start=" + StartPage.ToString() + "&sort_column=price&sort_dir=asc&count=100&currency=1&l=english");
            HttpWebResponse myRes = (HttpWebResponse)myReq.GetResponse();
            using (StreamReader sr = new StreamReader(myRes.GetResponseStream()))
            {
                pageContent = sr.ReadToEnd();
            }
        }
        catch { Thread.Sleep(30000); priceFromMarket(StartPage); }
        if (pageContent == null) { priceFromMarket(StartPage); }
        try
        {
            JObject user = JObject.Parse(pageContent);
            bool success = (bool)user["success"];
            if (success)
            {
                results_html = (string)user["results_html"];
                string data = results_html;
                data = "<root>" + data + "</root>";
                XmlDocument document = new XmlDocument();
                document.LoadXml(System.Net.WebUtility.HtmlDecode(data));
                XmlNode rootnode = document.SelectSingleNode("root");
                XmlNodeList items = rootnode.SelectNodes("./a/div");
                foreach (XmlNode node in items)
                {
                    //This does not work anymore!
                    //The try fails here at line 574!
                    string value = node.SelectSingleNode("./div[contains(concat(' ', @class, ' '), ' market_listing_their_price ')]/span/span").InnerText;
                    string num = node.SelectSingleNode("./div[contains(concat(' ', @class, ' '), ' market_listing_num_listings ')]/span/span").InnerText;
                    string name = node.SelectSingleNode("./div/span[contains(concat(' ', @class, ' '), ' market_listing_item_name ')]").InnerText;
                    valueList.Add(value); //Lowest price for the item
                    numList.Add(num); //Volume of that item
                    nameList.Add(name); //Name of that item
                }
            }
            else { Thread.Sleep(60000); priceFromMarket(StartPage); }
        }
        catch { Thread.Sleep(60000); priceFromMarket(StartPage); }
    }

It's never reliable to parse HTML as XML because HTML doesn't have to be well formatted to be parsed properly... 将HTML解析为XML永远都不可靠,因为HTML的格式必须正确才能正确解析...

For parsing HTML in C# i prefer to use CSQuery https://www.nuget.org/packages/CsQuery/ 为了在C#中解析HTML,我更喜欢使用CSQuery https://www.nuget.org/packages/CsQuery/

it lets you parse HTML in c# similar to doing it via jquery. 它使您可以通过c#解析HTML,类似于通过jquery解析HTML。

Another way is HTML Agility Pack which you could probably use without changing much of your code.. it's functions are similar to the System.Xml.XmlDocument Library. 另一种方法是HTML Agility Pack,您可以在不更改大量代码的情况下使用它。其功能类似于System.Xml.XmlDocument库。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM