简体   繁体   English

如何计算 ruby 中 txt 文件中的唯一名称

[英]How do I count unique names from a txt file in ruby

How do I find the unique count of full names from a.txt file, with unique names only found once in Ruby?如何从 a.txt 文件中找到唯一的全名计数,唯一的名称在 Ruby 中只找到一次?

This is the first 10 lines from the.txt file:这是 .txt 文件的前 10 行:

    Smith, Kim -- ut
    Voluptatem ipsam et at.
    Marv, Gardens -- non
    Facere et necessitatibus animi.
    McLoughlin, Matt -- consequatur
    Eveniet temporibus ducimus amet eaque.
    Smith, Jen -- pariatur
    Unde voluptas sit fugit.
    Brad, Nick -- et
    Maiores ab officia sed.

If you only care about unique items then what you want is a Set .如果您只关心独特的物品,那么您想要的是Set

For example:例如:

names = Set.new(File.readlines('names.txt').map(&:chomp))

Where that takes the "chomped" version of each line (minus linefeed character) and puts it into the Set.这需要每行的“chomped”版本(减去换行符)并将其放入集合中。

Now you can get them all back:现在您可以将它们全部取回:

names.sort.each do |name|
  puts name
end

I've assumed that it is the last names that must be unique.我假设姓氏必须是唯一的。

Let's first create the file.让我们首先创建文件。

text =<<~END
Smith, Kim
ut Voluptatem ipsam et at.
Marv, Gardens
non Facere et necessitatibus animi.
McLoughlin, Matt
consequatur Eveniet temporibus ducimus amet eaque.
Smith, Jen
pariatur Unde voluptas sit fugit.
Brad, Nick
et Maiores ab officia sed.
END

FName = "test.txt"

File.write(FName, text)
  #=> 239

See IO::write 1 .参见IO::write 1 We now read the file and calculate the number of unique last names.我们现在读取文件并计算唯一姓氏的数量。

require 'set'

File.foreach(FName).with_index.with_object(Set.new) do |(line, idx),set|
  set << line[/.+(?=,)/] if idx.even?
end.size
  #=> 4

The steps are as follows.步骤如下。

enum1 = File.foreach(FName)
  #=> #<Enumerator: File:foreach("test.txt")> 
enum2 = enum1.with_index
  #=> #<Enumerator: #<Enumerator: File:foreach("test.txt")>:with_index> 
enum3 = enum2.with_object(Set.new)
  #=> #<Enumerator: #<Enumerator: #<Enumerator: 
  #      File:foreach("test.txt")>:with_index>:with_object(#<Set: {}>)> 

See IO::foreach , Enumerator#with_index , Enumerator#with_object and Set::new .请参阅IO::foreachEnumerator#with_indexEnumerator#with_objectSet::new Notice that enum2 and enum3 can be thought of as compound enumerators .请注意,可以将enum2enum3视为复合枚举数

The first element is generated by enum3 , passed to the block and the block variables are assigned values:第一个元素由enum3生成,传递给块并且块变量被赋值:

(line, idx),set = enum3.next
  #=> [["Smith, Kim\n", 0], #<Set: {}>] 
line
  #=> "Smith, Kim\n" 
idx
  #=> 0 
set
  #=> #<Set: {}> 

line , idx and set are the block variables . lineidxset块变量 The process of breaking enum3.next into its three components is called array decomposition .enum3.next分解为三个组件的过程称为数组分解 See this article for a fuller discussion of this important technique.有关此重要技术的更全面讨论,请参阅本文

The block calculation is now perfomed:现在执行块计算:

idx.even?
  #=> true 
s = line[/.+(?=,)/]
  #=> "Smith" 
set << s
  #=> #<Set: {"Smith"}> 

See Integer#even?看到整数#偶数? and Set#<< .设置#<< In calculating s , the (third form of the) method Sting#[] is used with the regular expression /.+(?=,)/ , which reads, "match one or more characters follow by a comma, (?=,) being a positive lookahead .在计算s时, Sting#[]方法的(第三种形式)与正则表达式/.+(?=,)/使用,其内容为“匹配一个或多个字符后跟逗号, (?=,)是一个积极的前瞻性

The second element is generated by enum3 , passed to the block, the block variables are assigned values and block calculation is performed:第二个元素由enum3生成,传递给块,块变量被赋值并执行块计算:

(line, idx),set = enum3.next
  #=> [["ut Voluptatem ipsam et at.\n", 1], #<Set: {"Smith"}>] 
line
  #=> "ut Voluptatem ipsam et at.\n" 
idx
  #=> 1 
set
  #=> #<Set: {"Smith"}> 
idx.even?
  #=> false 

Since idx.even? #=> false自从idx.even? #=> false idx.even? #=> false we skip this line. idx.even? #=> false我们跳过这一行。 (Indeed, the only reason for including with_index is to determine which lines have even indices.) The third element is generated by enum3 , passed to the block, the block variables are assigned values and block calculation is performed: (确实,包含with_index的唯一原因是确定哪些行具有偶数索引。)第三个元素由enum3生成,传递给块,为块变量赋值并执行块计算:

(line, idx),set = enum3.next
  #=> [["Marv, Gardens\n", 2], #<Set: {"Smith"}>] 
line
  #=> "Marv, Gardens\n" 
idx
  #=> 2 
set
  #=> #<Set: {"Smith"}> 
idx.even?
  #=> true 
s = line[/.+(?=,)/]
  #=> "Marv" 
set << s
  #=> #<Set: {"Smith", "Marv"}> 

and so on, until we obtain:依此类推,直到我们获得:

arr = File.foreach(FName).with_index.with_object(Set.new) do |(line, idx),set|
  set << line[/.+(?=,)/] if idx.even?
end
  #=> #<Set: {"Smith", "Marv", "McLoughlin", "Brad"}>

Notice that, since sets contain unique values, "Smith" was not added to the set when processing "Smith, Jen" .请注意,由于集合包含唯一值,因此在处理"Smith, Jen"时不会将"Smith"添加到集合中。 We now perform the final step:我们现在执行最后一步:

arr.size
  #=> 4

1 Even though write is a method of IO , it is customary to write it (and other IO methods) with File as its receiver. 1 尽管writeIO的一种方法,但习惯上以File作为其接收者来编写它(以及其他IO方法)。 This is permissible because File is a subclass of IO , and therefore inherits the latter's methods.这是允许的,因为FileIO的子类,因此继承了后者的方法。 The two colons in IO::write signifies that write is a class method . IO::write中的两个冒号表示writeclass 方法 By contrast, the pound sign in IO#gets indicates that gets is an instance method .相比之下, IO#gets中的井号表示gets是一个实例方法

You first would have to find out what is a name and not.您首先必须找出什么是名称而不是名称。 Then you could push the names into an array based on if the array already has that name.然后,您可以根据数组是否已经具有该名称将名称推送到数组中。

array.push(name) unless array.include?(name)

Then just do a count on the array然后只需对数组进行计数

array.count

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM