简体   繁体   中英

Binning data from hash in Ruby

I am trying to group users to create scatter plots from their data from a ruby hash that looks like this:

[{"userid"=>"1275", "num"=>"1", "amount"=>"15.00"}, 
 {"userid"=>"1286", "num"=>"3", "amount"=>"26.67"}, .... ] 

Basically, the values in num can be integers from 1 to 4, while amount goes up to ~100. I want to bin two levels deep, first grouping by num, and then each of the 4 new bins should be divided further by amount (0-20, 20-50, 50-80, 80+) for 16 groups total.

The end product should be an array of hashes, or an array of arrays, which I could then pass on to my view to plot stuff in d3. I have a functional version, did it using case statements and basic flow control conditioning, but i'd like to do this using the group_by clause to have more elegant/shorter code.

I don't really understand the documentation on group_by, so any help would be appreciated.

EDIT: The output should be something more or less like this

[[{"userid"=>"1", "num"=>"1", "amount"=>"15.00"}
  {"userid"=>"2", "num"=>"1", "amount"=>"19.00"}],
 [{"userid"=>"3", "num"=>"1", "amount"=>"25.00"}
  {"userid"=>"4", "num"=>"1", "amount"=>"30.00"}],
 [{"userid"=>"5", "num"=>"2", "amount"=>"15.00"}]]

Basically an array with 16 sub arrays of the key value pairs.

It looks like you can do this by applying two different group_by operations:

data = [
  {"userid"=>"1", "num"=>"1", "amount"=>"15.00"},
  {"userid"=>"2", "num"=>"1", "amount"=>"19.00"},
  {"userid"=>"3", "num"=>"1", "amount"=>"25.00"},
  {"userid"=>"4", "num"=>"1", "amount"=>"30.00"},
  {"userid"=>"5", "num"=>"2", "amount"=>"15.00"}
]

# Establish the arbitrary groupings as a set of functions which
# can be evaluated. If these overlap in ranges, the first match
# will be used.
groupings = [
  lambda { |v| v >= 0 && v <= 20 },
  lambda { |v| v > 20 && v <= 50 },
  lambda { |v| v > 50 && v <= 80 },
  lambda { |v| v > 80 }
]

data.group_by do |element|
  # Group by the 'num' key first
  element['num']
end.flat_map do |num, elements|
  # Then group these sets by which of the range buckets
  # they should be sorted into.
  elements.group_by do |element|
    # Create an array that looks like [ false, true, false, ... ]
    # based on the test results, then find the index of the
    # first true entry.
    groupings.map do |fn|
      fn.call(element['amount'].to_f)
    end.index(true)
  end.values
end

# => [[{"userid"=>"1", "num"=>"1", "amount"=>"15.00"}, {"userid"=>"2", "num"=>"1", "amount"=>"19.00"}], [{"userid"=>"3", "num"=>"1", "amount"=>"25.00"}, {"userid"=>"4", "num"=>"1", "amount"=>"30.00"}], [{"userid"=>"5", "num"=>"2", "amount"=>"15.00"}]]

Calling .values on the result of a group_by will give you just the grouped sets, not the keys that indicate which group they are.

Maybe like this?

I'm using the array group_by function but also considering the amount by binning it and put it into the group_by condition

arr = [{"userid"=>"1", "num"=>"1", "amount"=>"15.00"},{"userid"=>"2", "num"=>"1", "amount"=>"19.00"},{"userid"=>"3", "num"=>"1", "amount"=>"25.00"},{"userid"=>"4", "num"=>"1", "amount"=>"30.00"},{"userid"=>"5", "num"=>"2", "amount"=>"15.00"}]

a2 = arr.group_by {|i| ((i['num'].to_i-1) + 4 * bin(i['amount'])) }.values

def bin val
    iVal = val.to_i
    if iVal<=20 then return 0 end
    if iVal<=50 then return 1 end
    if iVal<=80 then return 2 end
    return 3
end

and the result is exactly as you wanted it to be

[[{"amount"=>"15.00", "num"=>"1", "userid"=>"1"}, {"amount"=>"19.00", "num"=>"1", "userid"=>"2"}], [{"amount"=>"15.00", "num"=>"2", "userid"=>"5"}], [{"amount"=>"25.00", "num"=>"1", "userid"=>"3"}, {"amount"=>"30.00", "num"=>"1", "userid"=>"4"}]]

I'm actually mapping two parameters into one dimensional parameter (hash function) So, the function is actually

<max value of num>*<bin according to amount>+<num-1>

if max value of num is 4 so bin 0 will map to 0..3 , bin 1 will map to 4..7 , bin 2 will map to 8..11 and bin 3 will map to 12..15 - See, no overlapping which is important.

I figured out a way to do it and added another piece of code to dereference the hash and just return the values for user ids in each group:

users_by_number = firstMonth.group_by {|i| i["num"]}
users_by_number.each_pair do |key, value|
    users_by_number[key] = value.group_by do |j|
        case 
            when j["amount"].to_f <=20 then :twenty
            when j["amount"].to_f <=50 then :twenty_fifty
            when j["amount"].to_f <=80 then :fifty_eighty
            when j["amount"].to_f > 80 then :eighty_plus                                                    
        end
    end

users_by_number[key].each_pair do |group, users|
users_by_number[key][group] = users.map! {|user| user["userid"].to_i}
    end
end

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM