I am trying to group users to create scatter plots from their data from a ruby hash that looks like this:
[{"userid"=>"1275", "num"=>"1", "amount"=>"15.00"},
{"userid"=>"1286", "num"=>"3", "amount"=>"26.67"}, .... ]
Basically, the values in num can be integers from 1 to 4, while amount goes up to ~100. I want to bin two levels deep, first grouping by num, and then each of the 4 new bins should be divided further by amount (0-20, 20-50, 50-80, 80+) for 16 groups total.
The end product should be an array of hashes, or an array of arrays, which I could then pass on to my view to plot stuff in d3. I have a functional version, did it using case statements and basic flow control conditioning, but i'd like to do this using the group_by clause to have more elegant/shorter code.
I don't really understand the documentation on group_by, so any help would be appreciated.
EDIT: The output should be something more or less like this
[[{"userid"=>"1", "num"=>"1", "amount"=>"15.00"}
{"userid"=>"2", "num"=>"1", "amount"=>"19.00"}],
[{"userid"=>"3", "num"=>"1", "amount"=>"25.00"}
{"userid"=>"4", "num"=>"1", "amount"=>"30.00"}],
[{"userid"=>"5", "num"=>"2", "amount"=>"15.00"}]]
Basically an array with 16 sub arrays of the key value pairs.
It looks like you can do this by applying two different group_by
operations:
data = [
{"userid"=>"1", "num"=>"1", "amount"=>"15.00"},
{"userid"=>"2", "num"=>"1", "amount"=>"19.00"},
{"userid"=>"3", "num"=>"1", "amount"=>"25.00"},
{"userid"=>"4", "num"=>"1", "amount"=>"30.00"},
{"userid"=>"5", "num"=>"2", "amount"=>"15.00"}
]
# Establish the arbitrary groupings as a set of functions which
# can be evaluated. If these overlap in ranges, the first match
# will be used.
groupings = [
lambda { |v| v >= 0 && v <= 20 },
lambda { |v| v > 20 && v <= 50 },
lambda { |v| v > 50 && v <= 80 },
lambda { |v| v > 80 }
]
data.group_by do |element|
# Group by the 'num' key first
element['num']
end.flat_map do |num, elements|
# Then group these sets by which of the range buckets
# they should be sorted into.
elements.group_by do |element|
# Create an array that looks like [ false, true, false, ... ]
# based on the test results, then find the index of the
# first true entry.
groupings.map do |fn|
fn.call(element['amount'].to_f)
end.index(true)
end.values
end
# => [[{"userid"=>"1", "num"=>"1", "amount"=>"15.00"}, {"userid"=>"2", "num"=>"1", "amount"=>"19.00"}], [{"userid"=>"3", "num"=>"1", "amount"=>"25.00"}, {"userid"=>"4", "num"=>"1", "amount"=>"30.00"}], [{"userid"=>"5", "num"=>"2", "amount"=>"15.00"}]]
Calling .values
on the result of a group_by
will give you just the grouped sets, not the keys that indicate which group they are.
Maybe like this?
I'm using the array group_by function but also considering the amount by binning it and put it into the group_by condition
arr = [{"userid"=>"1", "num"=>"1", "amount"=>"15.00"},{"userid"=>"2", "num"=>"1", "amount"=>"19.00"},{"userid"=>"3", "num"=>"1", "amount"=>"25.00"},{"userid"=>"4", "num"=>"1", "amount"=>"30.00"},{"userid"=>"5", "num"=>"2", "amount"=>"15.00"}]
a2 = arr.group_by {|i| ((i['num'].to_i-1) + 4 * bin(i['amount'])) }.values
def bin val
iVal = val.to_i
if iVal<=20 then return 0 end
if iVal<=50 then return 1 end
if iVal<=80 then return 2 end
return 3
end
and the result is exactly as you wanted it to be
[[{"amount"=>"15.00", "num"=>"1", "userid"=>"1"}, {"amount"=>"19.00", "num"=>"1", "userid"=>"2"}], [{"amount"=>"15.00", "num"=>"2", "userid"=>"5"}], [{"amount"=>"25.00", "num"=>"1", "userid"=>"3"}, {"amount"=>"30.00", "num"=>"1", "userid"=>"4"}]]
I'm actually mapping two parameters into one dimensional parameter (hash function) So, the function is actually
<max value of num>*<bin according to amount>+<num-1>
if max value of num is 4 so bin 0 will map to 0..3 , bin 1 will map to 4..7 , bin 2 will map to 8..11 and bin 3 will map to 12..15 - See, no overlapping which is important.
I figured out a way to do it and added another piece of code to dereference the hash and just return the values for user ids in each group:
users_by_number = firstMonth.group_by {|i| i["num"]}
users_by_number.each_pair do |key, value|
users_by_number[key] = value.group_by do |j|
case
when j["amount"].to_f <=20 then :twenty
when j["amount"].to_f <=50 then :twenty_fifty
when j["amount"].to_f <=80 then :fifty_eighty
when j["amount"].to_f > 80 then :eighty_plus
end
end
users_by_number[key].each_pair do |group, users|
users_by_number[key][group] = users.map! {|user| user["userid"].to_i}
end
end
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.