简体   繁体   中英

Sort List by date values

I have the following list -

List<string> finalMessageContent

where

finalMessageContent[0] = "<div class="mHr" id="mFID"> 
   <div id="postedDate">11/12/2015 11:12:16</div>
</div>" // etc etc

I am trying to sort the list by a particular value located in the entires - postedDate tag.

Firstly I have create an new object and then serialized it to make the html elements able to be parsed -

string[][] newfinalMessageContent = finalMessageContent.Select(x => new string[] { x }).ToArray();

string json = JsonConvert.SerializeObject(newfinalMessageContent);
JArray markerData = JArray.Parse(json);

And then used Linq to try and sort using OrderByDescending -

var items = markerData.OrderByDescending(x => x["postedDate"].ToString()).ToList();

However this is failing when trying to parse the entry with -

Accessed JArray values with invalid key value: "postedDate". Array position index expected.

Perhaps linq is not the way to go here however it seemed like the most optimised, where am I going wrong?

First, i would not use string methods, regex or a JSON-parser to parse HTML. I would use HtmlAgilityPack . Then you could provide such a method:

private static DateTime? ExtractPostedDate(string inputHtml, string controlID = "postedDate")
{
    var doc = new HtmlAgilityPack.HtmlDocument();
    doc.LoadHtml(inputHtml);
    HtmlNode  div = doc.GetElementbyId(controlID);
    DateTime? result = null;
    DateTime value;
    if (div != null && DateTime.TryParse(div.InnerText.Trim(), DateTimeFormatInfo.InvariantInfo, DateTimeStyles.None, out value))
        result = value;
    return result;
}

and following LINQ query:

finalMessageContent = finalMessageContent
    .Select(s => new { String = s, Date = ExtractPostedDate(s) })
    .Where(x => x.Date.HasValue)
    .OrderByDescending(x => x.Date.Value)
    .Select(x => x.String)
    .ToList();

Json Serializer serializes JSON typed strings. Example here to json

To parse HTML I suggest using HtmlAgility https://htmlagilitypack.codeplex.com/

Like this:

            HtmlAgilityPack.HtmlDocument htmlparsed = new HtmlAgilityPack.HtmlDocument();
            htmlParsed.LoadHtml(finalMessageContent[0]);
            List<HtmlNode> OrderedDivs = htmlParsed.DocumentNode.Descendants("div").
            Where(a => a.Attributes.Any(af => af.Value == "postedDate")).
            OrderByDescending(d => DateTime.Parse(d.InnerText)); //unsafe parsing

Don't know if I get your question right. But did you know that you can parse HTML with XPath?

foreach (var row in doc.DocumentNode.SelectNodes("//div[@id="postedDate"]")) 
{
    Console.WriteLine(row.InnerText);     
}

this is just an example from the top of my head you might have to double-check the XPath query depending on your document. You can also consider converting it to array or parsing the date and do other transformations with it.

Like I said this is just from the top of my head. Or if the html is not so compley consider to extract the dates with an RegEx but this would be a topic for another question.

HTH

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM