简体   繁体   中英

C# Trying to read a page using XmlNode

So I am trying to read the Steam store page from the lowest price to the highest. I have the URL needed and I have written some code which have worked in the past but does not work anymore. I have spend some days trying to fix this problem but I just can't seem to find the problem.

Link I am trying to read.

Here is the code.

    //List of items from the Steam market from lowest to highest
    private void priceFromMarket(int StartPage)
    {
        if (valueList.Count != 0)
        {
            valueList.Clear();
            numList.Clear();
            nameList.Clear();
        }
        string pageContent = null;
        string results_html = null;
        try
        {
            HttpWebRequest myReq = (HttpWebRequest)WebRequest.Create("http://steamcommunity.com/market/search/render/?query=appid:730&start=" + StartPage.ToString() + "&sort_column=price&sort_dir=asc&count=100&currency=1&l=english");
            HttpWebResponse myRes = (HttpWebResponse)myReq.GetResponse();
            using (StreamReader sr = new StreamReader(myRes.GetResponseStream()))
            {
                pageContent = sr.ReadToEnd();
            }
        }
        catch { Thread.Sleep(30000); priceFromMarket(StartPage); }
        if (pageContent == null) { priceFromMarket(StartPage); }
        try
        {
            JObject user = JObject.Parse(pageContent);
            bool success = (bool)user["success"];
            if (success)
            {
                results_html = (string)user["results_html"];
                string data = results_html;
                data = "<root>" + data + "</root>";
                XmlDocument document = new XmlDocument();
                document.LoadXml(System.Net.WebUtility.HtmlDecode(data));
                XmlNode rootnode = document.SelectSingleNode("root");
                XmlNodeList items = rootnode.SelectNodes("./a/div");
                foreach (XmlNode node in items)
                {
                    //This does not work anymore!
                    //The try fails here at line 574!
                    string value = node.SelectSingleNode("./div[contains(concat(' ', @class, ' '), ' market_listing_their_price ')]/span/span").InnerText;
                    string num = node.SelectSingleNode("./div[contains(concat(' ', @class, ' '), ' market_listing_num_listings ')]/span/span").InnerText;
                    string name = node.SelectSingleNode("./div/span[contains(concat(' ', @class, ' '), ' market_listing_item_name ')]").InnerText;
                    valueList.Add(value); //Lowest price for the item
                    numList.Add(num); //Volume of that item
                    nameList.Add(name); //Name of that item
                }
            }
            else { Thread.Sleep(60000); priceFromMarket(StartPage); }
        }
        catch { Thread.Sleep(60000); priceFromMarket(StartPage); }
    }

It's never reliable to parse HTML as XML because HTML doesn't have to be well formatted to be parsed properly...

For parsing HTML in C# i prefer to use CSQuery https://www.nuget.org/packages/CsQuery/

it lets you parse HTML in c# similar to doing it via jquery.

Another way is HTML Agility Pack which you could probably use without changing much of your code.. it's functions are similar to the System.Xml.XmlDocument Library.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM