简体   繁体   English

如何在Ruby中查找数组的唯一出现次数

[英]How to find the number of unique occurrences for an Array in Ruby

I have an array, containing n amount of elements. 我有一个数组,包含n个元素。 Each element contains two words. 每个元素包含两个词。

This makes the array look like this: ['England John', 'England Ben', 'USA Paul', 'England John'] 这使数组看起来像这样: ['England John', 'England Ben', 'USA Paul', 'England John']

I want to find the number of unique names for each country. 我想找到每个国家/地区的唯一名称的数量。 For example, England would have 2 unique names as John exists two times. 例如, England将有两个唯一的名称,因为John存在两次。

So far I have split the array into two arrays, one containing the countries such as ['England', 'Usa', ...] and the other containing names ['John', 'Paul', ...] , however I'm unsure of where to go from here 到目前为止,我已将数组分为两个数组,一个包含['England', 'Usa', ...] ,另一个包含名称['John', 'Paul', ...] 。我不确定从这里去哪里

One liner option: 一种班轮选项:

ary.uniq.group_by { |e| e.split.first }.transform_values(&:count)
#=> {"England"=>2, "USA"=>1}

The problem, really, is that you're storing this data as an array of strings. 实际上,问题在于您将这些数据存储为字符串数组。 This is a poor choice of data structure, as it makes manipulation much harder. 这是对数据结构的错误选择,因为它使操作更加困难。

Suppose, for example, we first convert this data into a Hash , which maps each country to the list of names: 例如,假设我们首先将此数据转换为Hash ,该Hash将每个国家/地区映射到名称列表:

data = ['England John', 'England Ben', 'USA Paul', 'England John']

mapped_names = {}

data.each do |item|
  country, name = item.split
  mapped_names[country] ||= []
  mapped_names[country] << name
end

Now, obtaining the count is quite easy: 现在,获取计数非常容易:

mapped_name_counts = unique_names.transform_values { |names| names.uniq.count }

The resulting variables are: 结果变量为:

mapped_names # => {"England"=>["John", "Ben", "John"], "USA"=>["Paul"]}
mapped_name_counts # => {"England"=>2, "USA"=>1}

And if using ruby version 2.7 (not yet released!!), that last line of code could even be simplified to: 如果使用ruby 2.7版(尚未发布!),那么最后一行代码甚至可以简化为:

mapped_name_counts = unique_names.tally(&:uniq)

A bit more verbose than the other solutions but does not use transform_values from ActiveSupport. 有点更详细的比其他解决方案,但不使用transform_values从的ActiveSupport。

require "set"

data = ["England John", "England Ben", "USA Paul", "England John", "Switzerland Pascal"]

names_per_country = data.each_with_object({}) do |country_and_name, accu|
  country, name = country_and_name.split(" ")
  country_data = accu[country] ||= Set.new
  country_data << name
end

names_per_country.each do |country, names|
  puts "#{country} has #{names.size} unique name(s)"
end

# => England has 2 unique names
# => USA has 1 unique names
# => Switzerland has 1 unique names

This solution first transforms the array to a Hash structure, where the key is the country name and the value is a Set . 此解决方案首先将数组转换为Hash结构,其中是国家/地区名称, Set I've chosen Set because it does take care of the unique part of your question automatically (a Set can not contain duplicates). 我之所以选择Set是因为Set会自动处理问题的唯一部分( Set不能包含重复项)。

After that you can find the number of unique names per country by checking the size of the Set . 之后,可以通过检查Setsize来找到每个国家的唯一名称的数量。 You can also find the names (the elements of the Set if required) 您还可以找到名称( Set的元素,如果需要)

arr = ['England John', 'England Ben', 'USA Paul', 'England John']

arr.uniq.each_with_object(Hash.new(0)) { |s,h| h[s[/\S+/]] += 1 }
  #=> {"England"=>2, "USA"=>1}

This requires two passes through the array ( arr.uniq being the first). 这需要两次通过数组(第一个是arr.uniq )。 To make only a single pass one could do the following. 要仅进行一次通过,可以执行以下操作。

require 'set'

uniques = Set.new
arr.each_with_object(Hash.new(0)) { |s,h| h[s[/\S+/]] += 1 if uniques.add?(s) }
  #=> {"England"=>2, "USA"=>1}

See the form of Hash::new that takes an argument (called the default value ), and also Set#add? 请参阅带有参数(称为默认值 )的Hash :: new形式,以及Set#add? .

It's not clear to me which of the two calculations would generally be faster. 我不清楚这两个计算中的哪一个通常会更快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM