简体   繁体   English

在Ruby的数组数组中查找重复项,求和它们的值,然后将它们组合

[英]find duplicates in Ruby array of arrays, sum their values, then combine them

I have a pipe delimited text file that i am looping through that looks like so: 我有一个循环定界的文本文件,如下所示:

123|ADAM JOHNSON|AAUA|||||||||||1||||
123||AAUA||||||||8675|90.0|90.0||||||||
444|STEVE SMITH|AAUA|||||||||||1|||||
444||AAUA||||||||2364|50.0|50.0|||||||
444||AAUA||||||||8453|50.0|50.0||||
567|ALLEN JONES|AAUA|||||||||||1||||||
567||AAUA||||||||6578|75.0|75.0||||||
567||AAUA||||||||1234|10.0|10.0||||
567||AAUA||||||||1234|15.0|15.0|||||

What I start with is grab the first, tenth and eleventh indexes of these rows and put them into an array of arrays like so: 我首先要抓住这些行的第一,第十和第十一个索引,然后将它们放入数组数组中,如下所示:

CSV.foreach('data.txt', { :col_sep => '|' }) do |row|
  if row[1].nil?
    @group_array << [row[0], [row[10], row[11]]]
  end
end

So I get something like: 所以我得到类似:

[["123", ["8675", "90.0"]]
------------------
["444", ["2364", "50.0"]]
["444", ["8453", "50.0"]]
------------------
["567", ["6578", "75.0"]]
["567", ["1234", "10.0"]]
["567", ["1234", "15.0"]]]

What I am struggling with is looping through the arrays, finding groupings with the same first index (the 3 integer ids), looping through that then finding any duplicates in the second array with the same 4 integer id, then adding the third index floats then spitting out a final array with the duplicates removed and their values summed. 我正在努力的是遍历数组,找到具有相同第一个索引(3个整数id)的分组,遍历整个数组,然后在具有相同4个整数id的第二个数组中查找任何重复项,然后添加第三个索引浮点数,然后吐出最终数组,删除重复项并对其值求和。

Expected output should look like: 预期输出应如下所示:

[["0006310001", ["789663473", "90.0"]],
["0006410001", ["297103188", "50.0"]],
["0006410001", ["757854164", "50.0"]],
["0006610001", ["557493572", "75.0"]],
["0006610001", ["981894386", "25.0"]]]

I don't think your data item looks like [["123", ["8675", "90.0"]]] based on how it's created... more likely it looks like ["123", ["8675", "90.0"]] 我认为您的数据项的创建方式看起来并不像[["123", ["8675", "90.0"]]] ...看起来更像是["123", ["8675", "90.0"]]

In which case, this should do what you want... 在这种情况下,这应该做您想要的...

group_array = @group_array.sort
result = []
save_3 = nil
save_4 = nil
total = 0.0
group_array.each do |group|
  if group[0] != save_3 || group[1][0] != save_4
    result << [save_3, [save_4, total.to_s]] if save_3
    total = 0.0
    save_3 = group[0]
    save_4 = group[1][0]
  end
  total += group[1][1].to_f
end
result << [save_3, [save_4, total.to_s]] if save_3

Ok, something that should scale a bit better... 好的,应该可以扩展的地方...

group_hash = Hash.new(0.0)
CSV.foreach('data.txt', { :col_sep => '|' }) do |row|
  if row[1].nil?
    group_hash[ [row[0], row[10]] ] += row[11].to_f
  end
end
# produces a hash with key [a, b] value sum_of_c
# if you need an array of [a, [b,c]] then next step is...
@group_array = group_hash.to_a.map{|entry| [entry[0][0], [entry[0][1], entry[1]]]}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM