简体   繁体   中英

How can I determine which value occurs the most in my collection?

So, I have a json file that has a list of fruits. Fruits key can map to a single fruit or a collection of fruits.

Eg:

[
    {
        "fruits": [
            "banana"
        ]
    },
    {
        "fruits": [
            "apple"
        ]
    },
    {
        "fruits": [
            "orange",
            "apple"
        ]
    }
]

I was wondering, how can I determine which fruit(s) occur the most in my json structure? That is, how do I know my how often a value occurs and which one is leading above the others?

Not sure if you're interested in having a class to deserialize into, but here's how you would do it. Feel free to skip the class and use dynamic deserialization:

class FruitCollection
{
    string[] Fruits { get; set; }
}

var fruitColls = JsonConvert.DeserializeObject<FruitCollection>(json);
var mostCommon = fruitColls
    .SelectMany(fc => fc.Fruits)
    .GroupBy(f => f)
    .OrderByDescending(g => g.Count())
    .First()
    .Key;

EDIT :

This question's pretty old, but I'll mention that the OrderByDescending , First thing is doing redundant work: you don't really need to sort to get the maximum. This is an age-old lazy hack that people keep doing because LINQ does not provide a nice MaxBy extension method.

Usually your input size is small enough and the other stuff adds enough overhead that you don't really care, but the "correct" way (eg if you had billions of fruit types) would be to use a proper MaxBy extension method or hack something out of Aggregate . Finding the max is worst-case linear, whereas sorting is worst case O(n log(n)) .

If you use Json.NET , you can load your json using LINQ to JSON , then use SelectTokens to recursively find all "fruits" properties, then recursively collect all descendants string values (those of type JValue ), group them by their string value, and put them in descending order:

        var token = JToken.Parse(jsonString);

        var fruits = token.SelectTokens("..fruits")  // Recursively find all "fruit" properties
            .SelectMany(f => f.DescendantsAndSelf()) // Recursively find all string literals undernearh each
            .OfType<JValue>()                        
            .GroupBy(f => (string)f)                 // Group by string value
            .OrderByDescending(g => g.Count())       // Descending order by count.
            .ToList();

Or, if you prefer to put your results into an anonymous type for clarity:

        var fruits = token.SelectTokens("..fruits")  // Recursively find all "fruit" properties
            .SelectMany(f => f.DescendantsAndSelf()) // Recursively find all string literals undernearh each
            .OfType<JValue>()
            .GroupBy(f => (string)f)                 // Group by string value
            .Select(g => new { Fruit = (string)g.Key, Count = g.Count() } )
            .OrderByDescending(f => f.Count)       // Descending order by count.
            .ToList();

Then afterwards:

        Console.WriteLine(JsonConvert.SerializeObject(fruits, Formatting.Indented));

Produces:

 [ { "Fruit": "apple", "Count": 2 }, { "Fruit": "banana", "Count": 1 }, { "Fruit": "orange", "Count": 1 } ] 

** Update **

Forgot to include the following extension method

public static class JsonExtensions
{
    public static IEnumerable<JToken> DescendantsAndSelf(this JToken node)
    {
        if (node == null)
            return Enumerable.Empty<JToken>();
        var container = node as JContainer;
        if (container != null)
            return container.DescendantsAndSelf();
        else
            return new [] { node };
    }
}

The original question was a little vague on the precise structure of the JSON which is why I suggested using Linq rather than deserialization.

The serialization class for this structure is simple:

public class RootObject
{
    public List<List<string>> fruits { get; set; }
}

So to deserialize:

var fruitListContainer = JsonConvert.DeserializeObject<RootObject>(jsonString);

Then you can put all fruits in one list:

List<string> fruits = fruitListContainer.fruits.SelectMany(f => f);

Now you have all fruits in one list, and you can do whatever you want. For sorting, see the other answers.

Assuming that the data is in a file named fruits.json, that jq ( http://stedolan.github.io/jq/ ) is on the PATH, and that you're using a Mac or Linux-style shell:

$ jq 'reduce (.[].fruits[]) as $fruit ({}; .[$fruit] += 1)' fruits.json
{
  "banana": 1,
  "apple": 2,
  "orange": 1
}

On Windows, the same thing will work if the quotation marks are suitably adjusted. Alternatively, if the one-line jq program is put in a file, say fruits.jq, the following command could be run in any supported environment:

jq -f fruits.jq fruits.json

If the data is coming from some other process, you can pipe it into jq, eg like so:

jq -f fruits.jq

One way to find the maximum count is to add a couple of filters, eg as follows:

$ jq 'reduce (.[].fruits[]) as $fruit ({}; .[$fruit] += 1) |
      to_entries | max_by(.value)' fruits.json
{
  "key": "apple",
  "value": 2
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM