简体   繁体   English

将数组转换为 Ruby 中的索引哈希

[英]Convert an array into an index hash in Ruby

I have an array, and I want to make a hash so I can quickly ask "is X in the array?".我有一个数组,我想创建一个散列,这样我就可以快速询问“数组中有 X 吗?”。

In perl, there is an easy (and fast) way to do this:在 perl 中,有一种简单(且快速)的方法可以做到这一点:

my @array = qw( 1 2 3 );
my %hash;
@hash{@array} = undef;

This generates a hash that looks like:这会生成一个散列,如下所示:

{
    1 => undef,
    2 => undef,
    3 => undef,
}

The best I've come up with in Ruby is:我在 Ruby 中想到的最好的是:

array = [1, 2, 3]
hash = Hash[array.map {|x| [x, nil]}]

which gives:这使:

{1=>nil, 2=>nil, 3=>nil}

Is there a better Ruby way?有没有更好的 Ruby 方法?

EDIT 1编辑 1

No, Array.include?不,Array.include? is not a good idea.不是个好主意。 Its slow .它的缓慢 It does a query in O(n) instead of O(1).它在 O(n) 而不是 O(1) 中执行查询。 My example array had three elements for brevity;为简洁起见,我的示例数组包含三个元素; assume the actual one has a million elements.假设实际的有一百万个元素。 Let's do a little benchmarking:让我们做一些基准测试:

#!/usr/bin/ruby -w
require 'benchmark'

array = (1..1_000_000).to_a
hash = Hash[array.map {|x| [x, nil]}]

Benchmark.bm(15) do |x|
    x.report("Array.include?") { 1000.times { array.include?(500_000) } }
    x.report("Hash.include?") { 1000.times { hash.include?(500_000) } }
end

Produces:产生:

                     user     system      total        real
Array.include?  46.190000   0.160000  46.350000 ( 46.593477)
Hash.include?    0.000000   0.000000   0.000000 (  0.000523)

If all you need the hash for is membership, consider using a Set :如果您只需要散列的成员资格,请考虑使用Set

Set

Set implements a collection of unordered values with no duplicates. Set实现了一组无重复值的无序值。 This is a hybrid of Array's intuitive inter-operation facilities and Hash's fast lookup.这是 Array 直观的互操作工具和 Hash 的快速查找的混合体。

Set is easy to use with Enumerable objects (implementing each ). Set易于与Enumerable对象一起使用(实现each )。 Most of the initializer methods and binary operators accept generic Enumerable objects besides sets and arrays.大多数初始化方法和二元运算符都接受除集合和数组之外的通用Enumerable对象。 An Enumerable object can be converted to Set using the to_set method.可以使用to_set方法Enumerable对象转换为Set

Set uses Hash as storage, so you must note the following points: Set使用Hash作为存储,所以必须注意以下几点:

  • Equality of elements is determined according to Object#eql?元素的相等性是根据Object#eql?确定的Object#eql? and Object#hash .Object#hash
  • Set assumes that the identity of each element does not change while it is stored. Set 假设每个元素的身份在存储时不会改变。 Modifying an element of a set will render the set to an unreliable state.修改集合的元素会使集合呈现不可靠状态。
  • When a string is to be stored, a frozen copy of the string is stored instead unless the original string is already frozen.当要存储字符串时,除非原始字符串已被冻结,否则将存储该字符串的冻结副本。

Comparison比较

The comparison operators < , > , <= and >= are implemented as shorthand for the {proper_,}{subset?,superset?} methods.比较运算符<><=>=被实现为 {proper_,}{subset?,superset?} 方法的简写。 However, the <=> operator is intentionally left out because not every pair of sets is comparable.然而, <=>运算符被有意省略,因为并非每对集合都具有可比性。 ({x,y} vs. {x,z} for example) (例如,{x,y} 与 {x,z})

Example例子

require 'set' s1 = Set.new [1, 2] # -> #<Set: {1, 2}> s2 = [1, 2].to_set # -> #<Set: {1, 2}> s1 == s2 # -> true s1.add("foo") # -> #<Set: {1, 2, "foo"}> s1.merge([2, 6]) # -> #<Set: {1, 2, "foo", 6}> s1.subset? s2 # -> false s2.subset? s1 # -> true

[...] [...]

Public Class Methods公共类方法

new(enum = nil)新(枚举 = 零)

Creates a new set containing the elements of the given enumerable object.创建一个包含给定可枚举对象元素的新集合。

If a block is given, the elements of enum are preprocessed by the given block.如果给定了一个块,则 enum 的元素由给定的块进行预处理。

try this one:试试这个:

a=[1,2,3]
Hash[a.zip]

You can do this very handy trick:你可以做这个非常方便的技巧:

Hash[*[1, 2, 3, 4].map {|k| [k, nil]}.flatten]
=> {1=>nil, 2=>nil, 3=>nil, 4=>nil}

If you want to quickly ask "is X in the array?"如果你想快速问“X 在数组中吗?” you should use Array#include?你应该使用Array#include? . .

Edit (in response to addition in OP):编辑(响应OP中的添加):

If you want speedy look up times, use a Set.如果您想要快速查找时间,请使用 Set。 Having a Hash that points to all nil s is silly.拥有一个指向所有nil的 Hash 是愚蠢的。 Conversion is an easy process too with Array#to_set .使用Array#to_set转换也是一个简单的过程。

require 'benchmark'
require 'set'

array = (1..1_000_000).to_a
set = array.to_set

Benchmark.bm(15) do |x|
    x.report("Array.include?") { 1000.times { array.include?(500_000) } }
    x.report("Set.include?") { 1000.times { set.include?(500_000) } }
end

Results on my machine:我机器上的结果:

                     user     system      total        real
Array.include?  36.200000   0.140000  36.340000 ( 36.740605)
Set.include?     0.000000   0.000000   0.000000 (  0.000515)

You should consider just using a set to begin with, instead of an array so that a conversion is never necessary.您应该考虑只使用一个集合而不是一个数组,这样就不需要转换了。

I'm fairly certain that there isn't a one-shot clever way to construct this hash.我相当确定没有一种一次性的聪明方法来构建这个哈希。 My inclination would be to just be explicit and state what I'm doing:我的倾向是明确指出我在做什么:

hash = {}
array.each{|x| hash[x] = nil}

It doesn't look particularly elegant, but it's clear, and does the job.它看起来不是特别优雅,但很清楚,并且可以完成工作。

FWIW, your original suggestion (under Ruby 1.8.6 at least) doesn't seem to work. FWIW,您最初的建议(至少在 Ruby 1.8.6 下)似乎不起作用。 I get an "ArgumentError: odd number of arguments for Hash" error.我收到一个“ArgumentError:Hash 的奇数个参数”错误。 Hash.[] expects a literal, even-lengthed list of values: Hash.[] 需要一个字面的、偶数长度的值列表:

Hash[a, 1, b, 2] # => {a => 1, b => 2}

so I tried changing your code to:所以我尝试将您的代码更改为:

hash = Hash[*array.map {|x| [x, nil]}.flatten]

but the performance is dire:但表现很糟糕:

#!/usr/bin/ruby -w
require 'benchmark'

array = (1..100_000).to_a

Benchmark.bm(15) do |x|
  x.report("assignment loop") {hash = {}; array.each{|e| hash[e] = nil}}
  x.report("hash constructor") {hash = Hash[*array.map {|e| [e, nil]}.flatten]}
end

gives

                     user     system      total        real
assignment loop  0.440000   0.200000   0.640000 (  0.657287)
hash constructor  4.440000   0.250000   4.690000 (  4.758663)

Unless I'm missing something here, a simple assignment loop seems the clearest and most efficient way to construct this hash.除非我在这里遗漏了一些东西,否则一个简单的赋值循环似乎是构建这个散列的最清晰和最有效的方法。

Rampion beat me to it. Rampion 击败了我。 Set might be the answer.设置可能是答案。

You can do:你可以做:

require 'set'
set = array.to_set
set.include?(x)

Your way of creating the hash looks good.您创建哈希的方式看起来不错。 I had a muck around in irb and this is another way我在 irb 遇到了麻烦,这是另一种方式

>> [1,2,3,4].inject(Hash.new) { |h,i| {i => nil}.merge(h) }
=> {1=>nil, 2=>nil, 3=>nil, 4=>nil}

I think chrismear 's point on using assignment over creation is great.我认为chrisear使用赋值而不是创建的观点很棒。 To make the whole thing a little more Ruby-esque, though, I might suggest assigning something other than nil to each element:不过,为了让整个事情更像 Ruby 风格,我可能会建议为每个元素分配nil以外的其他内容:

hash = {}
array.each { |x| hash[x] = 1 } # or true or something else "truthy"
...
if hash[376]                   # instead of if hash.has_key?(376)
  ...
end

The problem with assigning to nil is that you have to use has_key?分配给nil的问题是你必须使用has_key? instead of [] , since [] give you nil (your marker value) if the Hash doesn't have the specified key.而不是[] ,因为如果Hash没有指定的键, []会给你nil (你的标记值)。 You could get around this by using a different default value, but why go through the extra work?可以通过使用不同的默认值来解决这个问题,但为什么要进行额外的工作呢?

# much less elegant than above:
hash = Hash.new(42)
array.each { |x| hash[x] = nil }
...
unless hash[376]
  ...
end

If you're not bothered what the hash values are如果您不担心哈希值是什么

irb(main):031:0> a=(1..1_000_000).to_a ; a.length
=> 1000000
irb(main):032:0> h=Hash[a.zip a] ; h.keys.length
=> 1000000

Takes a second or so on my desktop.在我的桌面上需要一秒钟左右。

Maybe I am misunderstanding the goal here;也许我误解了这里的目标; If you wanted to know if X was in the array, why not do array.include?("X") ?如果你想知道 X 是否在数组中,为什么不做 array.include?("X") ?

Doing some benchmarking on the suggestions so far gives that chrismear and Gaius's assignment-based hash creation is slightly faster than my map method (and assigning nil is slightly faster than assigning true).对目前的建议进行一些基准测试,结果表明 chrisear 和 Gaius 基于分配的哈希创建比我的 map 方法稍快(并且分配 nil 比分配 true 稍快)。 mtyaka and rampion's Set suggestion is about 35% slower to create. mtyaka 和rampion 的Set 建议的创建速度要慢35%。

As far as lookups, hash.include?(x) is a very tiny amount faster than hash[x] ;就查找而言, hash.include?(x)hash[x]快一hash.include?(x) both are twice as a fast as set.include?(x) .两者的速度都是set.include?(x)两倍。

                user     system      total        real
chrismear   6.050000   0.850000   6.900000 (  6.959355)
derobert    6.010000   1.060000   7.070000 (  7.113237)
Gaius       6.210000   0.810000   7.020000 (  7.049815)
mtyaka      8.750000   1.190000   9.940000 (  9.967548)
rampion     8.700000   1.210000   9.910000 (  9.962281)

                user     system      total        real
times      10.880000   0.000000  10.880000 ( 10.921315)
set        93.030000  17.490000 110.520000 (110.817044)
hash-i     45.820000   8.040000  53.860000 ( 53.981141)
hash-e     47.070000   8.280000  55.350000 ( 55.487760)

Benchmarking code is:基准代码是:

#!/usr/bin/ruby -w
require 'benchmark'
require 'set'

array = (1..5_000_000).to_a

Benchmark.bmbm(10) do |bm|
    bm.report('chrismear') { hash = {}; array.each{|x| hash[x] = nil} }
    bm.report('derobert')  { hash = Hash[array.map {|x| [x, nil]}] }
    bm.report('Gaius')     { hash = {}; array.each{|x| hash[x] = true} }
    bm.report('mtyaka')    { set = array.to_set }
    bm.report('rampion')   { set = Set.new(array) }
end

hash = Hash[array.map {|x| [x, true]}]
set = array.to_set
array = nil
GC.start

GC.disable
Benchmark.bmbm(10) do |bm|
    bm.report('times')  { 100_000_000.times { } }
    bm.report('set')    { 100_000_000.times { set.include?(500_000) } }
    bm.report('hash-i') { 100_000_000.times { hash.include?(500_000) } }
    bm.report('hash-e') { 100_000_000.times { hash[500_000] } }
end
GC.enable

This preserves 0's if your hash was [0,0,0,1,0]如果您的哈希[0,0,0,1,0]这将保留 0

  hash = {}
  arr.each_with_index{|el, idx| hash.merge!({(idx + 1 )=> el }) }

Returns :回报:

  # {1=>0, 2=>0, 3=>0, 4=>1, 5=>0}

If you're looking for an equivalent of this Perl code:如果您正在寻找与此 Perl 代码等效的代码:

grep {$_ eq $element} @array

You can just use the simple Ruby code:您可以只使用简单的 Ruby 代码:

array.include?(element)

Here's a neat way to cache lookups with a Hash:这是一种使用哈希缓存查找的巧妙方法:

a = (1..1000000).to_a
h = Hash.new{|hash,key| hash[key] = true if a.include? key}

Pretty much what it does is create a default constructor for new hash values, then stores "true" in the cache if it's in the array (nil otherwise).它所做的几乎是为新的哈希值创建一个默认构造函数,然后如果它在数组中,则将“true”存储在缓存中(否则为 nil)。 This allows lazy loading into the cache, just in case you don't use every element.这允许延迟加载到缓存中,以防万一您不使用每个元素。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM