[英]Binning data from hash in Ruby
I am trying to group users to create scatter plots from their data from a ruby hash that looks like this: 我试图对用户进行分组,以根据其来自红宝石哈希的数据创建散点图,如下所示:
[{"userid"=>"1275", "num"=>"1", "amount"=>"15.00"},
{"userid"=>"1286", "num"=>"3", "amount"=>"26.67"}, .... ]
Basically, the values in num can be integers from 1 to 4, while amount goes up to ~100. 基本上,num中的值可以是1到4的整数,而数量最多可以达到100。 I want to bin two levels deep, first grouping by num, and then each of the 4 new bins should be divided further by amount (0-20, 20-50, 50-80, 80+) for 16 groups total. 我想将两个级别的容器进行分类,首先按编号进行分组,然后再将4个新容器中的每个容器进一步按数量划分(0-20、20-50、50-80、80 +),总共16组。
The end product should be an array of hashes, or an array of arrays, which I could then pass on to my view to plot stuff in d3. 最终产品应该是哈希数组或数组数组,然后我可以将其传递给我的视图以在d3中绘制内容。 I have a functional version, did it using case statements and basic flow control conditioning, but i'd like to do this using the group_by clause to have more elegant/shorter code. 我有一个功能版本,是使用case语句和基本的流控制条件来完成的,但是我想使用group_by子句来做到这一点,以使其代码更优美/更简短。
I don't really understand the documentation on group_by, so any help would be appreciated. 我不太了解group_by上的文档,因此将不胜感激。
EDIT: The output should be something more or less like this 编辑:输出应或多或少像这样
[[{"userid"=>"1", "num"=>"1", "amount"=>"15.00"}
{"userid"=>"2", "num"=>"1", "amount"=>"19.00"}],
[{"userid"=>"3", "num"=>"1", "amount"=>"25.00"}
{"userid"=>"4", "num"=>"1", "amount"=>"30.00"}],
[{"userid"=>"5", "num"=>"2", "amount"=>"15.00"}]]
Basically an array with 16 sub arrays of the key value pairs. 基本上是一个包含16个键值对子数组的数组。
It looks like you can do this by applying two different group_by
operations: 看起来您可以通过应用两个不同的group_by
操作来做到这一点:
data = [
{"userid"=>"1", "num"=>"1", "amount"=>"15.00"},
{"userid"=>"2", "num"=>"1", "amount"=>"19.00"},
{"userid"=>"3", "num"=>"1", "amount"=>"25.00"},
{"userid"=>"4", "num"=>"1", "amount"=>"30.00"},
{"userid"=>"5", "num"=>"2", "amount"=>"15.00"}
]
# Establish the arbitrary groupings as a set of functions which
# can be evaluated. If these overlap in ranges, the first match
# will be used.
groupings = [
lambda { |v| v >= 0 && v <= 20 },
lambda { |v| v > 20 && v <= 50 },
lambda { |v| v > 50 && v <= 80 },
lambda { |v| v > 80 }
]
data.group_by do |element|
# Group by the 'num' key first
element['num']
end.flat_map do |num, elements|
# Then group these sets by which of the range buckets
# they should be sorted into.
elements.group_by do |element|
# Create an array that looks like [ false, true, false, ... ]
# based on the test results, then find the index of the
# first true entry.
groupings.map do |fn|
fn.call(element['amount'].to_f)
end.index(true)
end.values
end
# => [[{"userid"=>"1", "num"=>"1", "amount"=>"15.00"}, {"userid"=>"2", "num"=>"1", "amount"=>"19.00"}], [{"userid"=>"3", "num"=>"1", "amount"=>"25.00"}, {"userid"=>"4", "num"=>"1", "amount"=>"30.00"}], [{"userid"=>"5", "num"=>"2", "amount"=>"15.00"}]]
Calling .values
on the result of a group_by
will give you just the grouped sets, not the keys that indicate which group they are. 在group_by
的结果上调用.values
只会给您分组的集合,而不是指示它们是哪个分组的键。
Maybe like this? 也许是这样吗?
I'm using the array group_by function but also considering the amount by binning it and put it into the group_by condition 我正在使用数组group_by函数,但也通过将其装箱并将其放入group_by条件来考虑数量
arr = [{"userid"=>"1", "num"=>"1", "amount"=>"15.00"},{"userid"=>"2", "num"=>"1", "amount"=>"19.00"},{"userid"=>"3", "num"=>"1", "amount"=>"25.00"},{"userid"=>"4", "num"=>"1", "amount"=>"30.00"},{"userid"=>"5", "num"=>"2", "amount"=>"15.00"}]
a2 = arr.group_by {|i| ((i['num'].to_i-1) + 4 * bin(i['amount'])) }.values
def bin val
iVal = val.to_i
if iVal<=20 then return 0 end
if iVal<=50 then return 1 end
if iVal<=80 then return 2 end
return 3
end
and the result is exactly as you wanted it to be 结果完全是您想要的
[[{"amount"=>"15.00", "num"=>"1", "userid"=>"1"}, {"amount"=>"19.00", "num"=>"1", "userid"=>"2"}], [{"amount"=>"15.00", "num"=>"2", "userid"=>"5"}], [{"amount"=>"25.00", "num"=>"1", "userid"=>"3"}, {"amount"=>"30.00", "num"=>"1", "userid"=>"4"}]]
I'm actually mapping two parameters into one dimensional parameter (hash function) So, the function is actually 我实际上是将两个参数映射到一维参数(哈希函数)中,所以函数实际上是
<max value of num>*<bin according to amount>+<num-1>
if max value of num is 4 so bin 0 will map to 0..3 , bin 1 will map to 4..7 , bin 2 will map to 8..11 and bin 3 will map to 12..15 - See, no overlapping which is important. 如果num的最大值是4,则bin 0将映射到0..3,bin 1将映射到4..7,bin 2将映射到8..11,bin 3将映射到12..15-参见,没有重叠,这很重要。
I figured out a way to do it and added another piece of code to dereference the hash and just return the values for user ids in each group: 我想出了一种方法,并添加了另一段代码以取消引用哈希,仅返回每个组中用户ID的值:
users_by_number = firstMonth.group_by {|i| i["num"]}
users_by_number.each_pair do |key, value|
users_by_number[key] = value.group_by do |j|
case
when j["amount"].to_f <=20 then :twenty
when j["amount"].to_f <=50 then :twenty_fifty
when j["amount"].to_f <=80 then :fifty_eighty
when j["amount"].to_f > 80 then :eighty_plus
end
end
users_by_number[key].each_pair do |group, users|
users_by_number[key][group] = users.map! {|user| user["userid"].to_i}
end
end
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.