简体   繁体   English

在ruby中将数据移动到循环内部的3个Separate Hashes中

[英]Move data into 3 Separate Hashes inside loop in ruby

It's only my second post and I'm still learning ruby. 这只是我的第二篇文章,我还在学习红宝石。 I'm trying to figure this out based on my Java knowledge but I can't seem to get it right. 我试图根据我的Java知识来解决这个问题,但我似乎无法做到这一点。

What I need to do is: I have a function that reads a file line by line and extract different car features from each line, for example: 我需要做的是:我有一个逐行读取文件的函数,并从每一行中提取不同的汽车特征,例如:

def convertListings2Catalogue (fileName)

f = File.open(fileName, "r")
f.each_line do |line|

  km=line[/[0-9]+km/]
  t = line[(Regexp.union(/sedan/i, /coupe/i, /hatchback/i, /station/i, /suv/i))]
  trans = ....
end end

Now for each line I need to store the extracted features into separate hashes that I can access later in my program. 现在,对于每一行,我需要将提取的功能存储到我可以在程序中稍后访问的单独哈希中。

The issues I'm facing: 1) I'm overwriting the features in the same hash 2) Can't access the hash outside my function 我面临的问题:1)我正在覆盖相同哈希中的功能2)无法访问我的函数外的哈希

That what's in my file: 那个在我的文件中:

65101km,Sedan,Manual,2010,18131A,FWD,Used,5.5L/100km,Toyota,camry,SE,{AC, Heated Seats, Heated Mirrors, Keyless Entry} 65101km,Sedan,Manual,2010,18131A,FWD,二手,5.5L / 100km,Toyota,camry,SE,{AC,加热座椅,加热后视镜,无钥匙进入}

coupe,1100km,auto,RWD, Mercedec,CLK,LX ,18FO724A,2017,{AC, Heated Seats, Heated Mirrors, Keyless Entry, Power seats},6L/100km,Used 轿跑车,1100km,汽车,RWD,Mercedec,CLK,LX,18FO724A,2017,{AC,加热座椅,加热后视镜,无钥匙进入,电动座椅},6L / 100km,二手

AWD,SUV,0km,auto,new,Honda,CRV,8L/100km,{Heated Seats, Heated Mirrors, Keyless Entry},19BF723A,2018,LE AWD,SUV,0km,汽车,新款,本田,CRV,8L / 100km,{加热座椅,加热后视镜,无钥匙进入},19BF723A,2018,LE

Now my function extracts the features of each car model, but I need to store these features in 3 different hashes with the same keys but different values. 现在我的函数提取每个汽车模型的功能,但我需要将这些功能存储在具有相同键但不同值的3个不同哈希中。

listing = Hash.new(0)
  listing = { kilometers: km, type: t, transmission: trans, drivetrain: dt, status: status, car_maker: car_maker }

I tried moving the data from one hash to another, I even tried storing the data in an array first and then moving it to the hash but I still can't figure out how to create separate hashes inside a loop. 我尝试将数据从一个哈希移动到另一个哈希,我甚至尝试先将数据存储在数组中然后将其移动到哈希,但我仍然无法弄清楚如何在循环内创建单独的哈希。
Thanks 谢谢

You could leverage the fact that your file instance is an enumerable . 您可以利用文件实例是可枚举的事实。 This allows you to leverage the inject method, and you can seed that with an empty hash. 这允许您利用inject方法,并且可以使用空哈希对其进行种子处理。 collector in this case is the hash that gets passed along as the iteration continues. 在这种情况下, collector是在迭代继续时传递的哈希。 Be sure to (implicitly, by having collector be the last line of the block) return the value of collector as the inject method will use this to feed into the next iteration. 一定要(隐含地,通过让collector成为块的最后一行)返回收集器的值,因为inject方法将使用它来提供给下一次迭代。 It's some pretty powerful stuff! 这是一些非常强大的东西!

I think this is roughly what you're going for. 认为这大致是你想要的。 I used model as the key in the hash, and set_of_features as your data. 我使用model作为哈希中的键,并使用set_of_features作为数据。

def convertListings2Catalogue (fileName)
  f = File.open(fileName, "r")

  my_hash = f.inject({}) do |collector, line|
    km=line[/[0-9]+km/]
    t = line[(Regexp.union(/sedan/i, /coupe/i, /hatchback/i, /station/i, /suv/i))]
    trans = line[(Regexp.union(/auto/i, /manual/i, /steptronic/i))]
    dt = line[(Regexp.union(/fwd/i, /rwd/i, /awd/i))]
    status = line[(Regexp.union(/used/i, /new/i))]
    car_maker = line[(Regexp.union(/honda/i, /toyota/i, /mercedes/i, /bmw/i, /lexus/i))]  
    stock = line.scan(/(\d+[a-z0-9]+[a-z](?<!km\b))(?:,|$)/i).first
    year = line.scan(/(\d{4}(?<!km\b))(?:,|$)/).first
    trim = line.scan(/\b[a-zA-Z]{2}\b/).first
    fuel = line.scan(/[\d.]+L\/\d*km/).first
    set_of_features = line.scan(/\{(.*?)\}/).first
    model = line[(Regexp.union(/camry/i, /clk/i, /crv/i))]
    collector[model] = set_of_features
    collector
  end
end

Hope I understood you're question correctly. 希望我理解你的问题是正确的。 I would do this like below. 我会像下面这样做。 Now everytime you would call this action it will return a hash with each listing in it. 现在,每当您调用此操作时,它将返回包含每个列表的哈希值。

    def convertListings2Catalogue (fileName)
      listings = []

      f = File.open(fileName, "r")
      f.each_line do |line|

        km=line[/[0-9]+km/]
        t = line[(Regexp.union(/sedan/i, /coupe/i, /hatchback/i, /station/i, /suv/i))]
        trans = line[(Regexp.union(/auto/i, /manual/i, /steptronic/i))]
        dt = line[(Regexp.union(/fwd/i, /rwd/i, /awd/i))]
        status = line[(Regexp.union(/used/i, /new/i))]
        car_maker = line[(Regexp.union(/honda/i, /toyota/i, /mercedes/i, /bmw/i, /lexus/i))]  
        stock = line.scan(/(\d+[a-z0-9]+[a-z](?<!km\b))(?:,|$)/i).first
        year = line.scan(/(\d{4}(?<!km\b))(?:,|$)/).first
        trim = line.scan(/\b[a-zA-Z]{2}\b/).first
        fuel = line.scan(/[\d.]+L\/\d*km/).first
        set_of_features = line.scan(/\{(.*?)\}/).first
        model = line[(Regexp.union(/camry/i, /clk/i, /crv/i))]

        listing = { kilometers: km, type: t, transmission: trans, drivetrain: dt, status: status, car_maker: car_maker }

        listings.push listing

        return listings
      end 
    end

Then wherever you use this you could just do. 然后,只要你使用它,你就可以做到。

listnings = convertListings2Catalogue("somefile.txt")
listnings.first #to get the first listing 

I don't fully understand the question but I thought it was important to suggest how you might deal with a more fundamental issue: extracting the desired information from each line of the file in an effective and Ruby-like manner. 我并不完全理解这个问题,但我认为建议如何处理一个更基本的问题很重要:以有效和类似Ruby的方式从文件的每一行中提取所需的信息。 Once you have that information, in the form of an array of hashes, one hash per line, you can do with it what you want. 一旦你获得了这些信息,就像一个哈希数组的形式,每行一个哈希,你可以用它做你想要的。 Alternatively, you could loop through the lines in the file, constructing a hash for each line and performing any desired operations before going on to the next line. 或者,您可以遍历文件中的行,为每行构建一个哈希并执行任何所需的操作,然后再继续下一行。

Being new to Ruby you will undoubtedly find some of the code below difficult to understand. 作为Ruby的新手,您无疑会发现下面的一些代码难以理解。 If you persevere, however, I think you will be able to understand all of it, and in the process learn a lot about Ruby. 但是,如果你坚持不懈,我认为你将能够理解所有这些,并在此过程中学到很多关于Ruby的知识。 I've made some suggestions in the last section of my answer to help you decipher the code. 我在答案的最后一部分提出了一些建议,以帮助您破译代码。

Code

words_by_key = {
  type:         %w| sedan coupe hatchback station suv |,
  transmission: %w| auto manual steptronic |,
  drivetrain:   %w| fwd rwd awd |,
  status:       %w| used new |,
  car_maker:    %w| honda toyota mercedes bmw lexus |,
  model:        %w| camry clk crv |
}
  #=> {:type=>["sedan", "coupe", "hatchback", "station", "suv"],
  #    :transmission=>["auto", "manual", "steptronic"],
  #    :drivetrain=>["fwd", "rwd", "awd"],
  #    :status=>["used", "new"],
  #    :car_maker=>["honda", "toyota", "mercedes", "bmw", "lexus"],
  #    :model=>["camry", "clk", "crv"]}

WORDS_TO_KEYS = words_by_key.each_with_object({}) { |(k,v),h| v.each { |s| h[s] = k } }
  #=> {"sedan"=>:type, "coupe"=>:type, "hatchback"=>:type, "station"=>:type, "suv"=>:type,
  #    "auto"=>:transmission, "manual"=>:transmission, "steptronic"=>:transmission,
  #    "fwd"=>:drivetrain, "rwd"=>:drivetrain, "awd"=>:drivetrain,
  #    "used"=>:status, "new"=>:status,
  #    "honda"=>:car_maker, "toyota"=>:car_maker, "mercedes"=>:car_maker,
  #      "bmw"=>:car_maker, "lexus"=>:car_maker,
  #    "camry"=>:model, "clk"=>:model, "crv"=>:model}

module ExtractionMethods
  def km(str)
    str[/\A\d+(?=km\z)/]
  end

  def year(str)
    str[/\A\d+{4}\z/]
  end

  def stock(str)
    return nil if str.end_with?('km')
    str[/\A\d+\p{Alpha}\p{Alnum}*\z/]
  end

  def trim(str)
    str[/\A\p{Alpha}{2}\z/]
  end

  def fuel_consumption(str)
    str.to_f if str[/\A\d+(?:\.\d+)?(?=l\/100km\z)/]
  end
end

class K
  include ExtractionMethods      
  def extract_hashes(fname)
    File.foreach(fname).with_object([]) do |line, arr|
      line = line.downcase
      idx_left = line.index('{')
      idx_right = line.index('}')
      if idx_left && idx_right    
        g = { set_of_features: line[idx_left..idx_right] }
        line[idx_left..idx_right] = ''
        line.squeeze!(',')
      else
        g = {}
      end
      arr << line.split(',').each_with_object(g) do |word, h|
        word.strip!
        if WORDS_TO_KEYS.key?(word)
          h[WORDS_TO_KEYS[word]] = word
        else
          ExtractionMethods.instance_methods.find do |m|
            v = public_send(m, word)
            (h[m] = v) unless v.nil?
            v
          end
        end
      end
    end
  end
end

Example

data =<<BITTER_END
65101km,Sedan,Manual,2010,18131A,FWD,Used,5.5L/100km,Toyota,camry,SE,{AC, Heated Seats, Heated Mirrors, Keyless Entry}
coupe,1100km,auto,RWD, Mercedec,CLK,LX ,18FO724A,2017,{AC, Heated Seats, Heated Mirrors, Keyless Entry, Power seats},6L/100km,Used
AWD,SUV,0km,auto,new,Honda,CRV,8L/100km,{Heated Seats, Heated Mirrors, Keyless Entry},19BF723A,2018,LE
BITTER_END

FILE_NAME = 'temp'
File.write(FILE_NAME, data)
  #=> 353 (characters written to file)

k = K.new
  #=> #<K:0x00000001c257d348>
k.extract_hashes(FILE_NAME)
  #=> [{:set_of_features=>"{ac, heated seats, heated mirrors, keyless entry}",
  #     :km=>"65101", :type=>"sedan", :transmission=>"manual", :year=>"2010",
  #     :stock=>"18131a", :drivetrain=>"fwd", :status=>"used", :fuel_consumption=>5.5,
  #     :car_maker=>"toyota", :model=>"camry", :trim=>"se"},
  #    {:set_of_features=>"{ac, heated seats, heated mirrors, keyless entry, power seats}",
  #     :type=>"coupe", :km=>"1100", :transmission=>"auto", :drivetrain=>"rwd",
  #     :model=>"clk", :trim=>"lx", :stock=>"18fo724a", :year=>"2017",
  #     :fuel_consumption=>6.0, :status=>"used"},
  #    {:set_of_features=>"{heated seats, heated mirrors, keyless entry}",
  #     :drivetrain=>"awd", :type=>"suv", :km=>"0", :transmission=>"auto",
  #     :status=>"new", :car_maker=>"honda", :model=>"crv", :fuel_consumption=>8.0,
  #     :stock=>"19bf723a", :year=>"2018", :trim=>"le"}]

Explanation 说明

Firstly, note that the HEREDOC needs to be un-indented before being executed. 首先,请注意HEREDOC在执行之前需要进行缩进。

You will see that the instance method K#extract_hashes uses IO#foreach to read the file line-by-line. 您将看到实例方法K#extract_hashes使用IO#foreach逐行读取文件。 1 1

The first step in processing each line of the file is to downcase it. 处理文件每一行的第一步是将其缩小。 You will then want to split the string on commas to form an array of words. 然后,您将要在逗号上拆分字符串以形成单词数组。 There is a problem, however, in that you don't want to split on commas that are between a left and right brace ( { and } ), which corresponds to the key :set_of_features . 但是,有一个问题是你不想拆分左右括号( {} )之间的逗号,这些逗号对应于键:set_of_features I decided to deal with that by determining the indices of the two braces, creating a hash with the single key :set_of_features , delete that substring from the line and lastly replace a resulting pair of adjacent commas with a single comma: 我决定通过确定两个大括号的索引来处理它,用单个键创建一个哈希:set_of_features ,从行中删除该子串,最后用一个逗号替换一对相邻的逗号:

  idx_left = line.index('{')
  idx_right = line.index('}')
  if idx_left && idx_right    
    g = { set_of_features: line[idx_left..idx_right] }
    line[idx_left..idx_right] = ''
    line.squeeze!(',')
  else
    g = {}
  end

See String for the documentation of the String methods used here and elsewhere. 有关此处和其他地方使用的String方法的文档,请参阅String

We can now convert the resulting line to an array of words by splitting on the commas. 我们现在可以通过分割逗号将结果line转换为单词数组。 If any capitalization is desired in the output this should be done after the hashes have been constructed. 如果输出中需要任何大小写,则应在构造哈希值之后完成。

We will build on the hash { set_of_features: line[idx_left..idx_right] } just created. 我们将构建刚刚创建的哈希{ set_of_features: line[idx_left..idx_right] } When complete, it will be appended to the array being returned. 完成后,它将附加到要返回的数组。

Each element ( word ) in the array, is then processed. 然后处理数组中的每个元素( word )。 If it is a key of the hash WORDS_TO_KEYS we set 如果它是我们设置的散列WORDS_TO_KEYS的键

h[WORDS_TO_KEYS[word]] = word

and are finished with that word. 并完成了这个词。 If not, we execute each of the instance methods m in the module ExtractionMethods until one is found for which m[word] is not nil . 如果没有,我们执行模块ExtractionMethods中的每个实例方法m ,直到找到m[word]不为nil When that is found another key-value pair is added to the hash h : 当找到它时,另一个键值对被添加到哈希h

h[m] = word

Notice that the name of each instance method in ExtractionMethods , which is a symbol (eg, :km ), is a key in the hash h . 请注意, ExtractionMethods中每个实例方法的名称(即符号(例如:km ))是哈希h的键。 Having separate methods facilitates debugging and testing. 使用单独的方法有助于调试和测试。

I could have written: 我本来可以写的:

if    (s = km(word))
  s
elsif (s = year(word))
  s
elsif (s = stock(str))
  s
elsif (s = trim(str))
  s
elsif (s = fuel_consumption(str))
  s
end

but since all these methods take the same argument, word , we can instead use Object#public_send : 但由于所有这些方法都采取同样的说法, word ,我们可以改用对象#public_send

a = [:km, :year, :stock, :trim, :fuel_consumption]

a.find do |m|
  v = public_send(m, word)
  (h[m] = v) unless v.nil?
  v 
end

A final tweak is to put all the methods in the array a in a module ExtractionMethods and include that module in the class K . 最后的调整是将数组中a所有方法放在一个ExtractionMethods模块中,并将该模块包含在类K We can then replace a in the find expression above with ExtractionMethods.instance_methods . 然后我们可以用ExtractionMethods.instance_methods替换上面的find表达式中的a (See Module#instance_methods .) (参见Module#instance_methods 。)

Suppose now that the data are changed so that additional fields are added (eg, for "colour" or "price"). 现在假设数据已更改,以便添加其他字段(例如,“颜色”或“价格”)。 Then the only modifications to the code required are changes to words_by_key and/or the addition of methods to ExtractionMethods . 然后,对所需代码的唯一修改是对words_by_key更改和/或向ExtractionMethods添加方法。

Understanding the code 理解代码

It may be helpful to run the code with puts statements inserted. 插入puts语句运行代码可能会有所帮助。 For example, 例如,

idx_left = line.index('{')
idx_right = line.index('}')
puts "idx_left=#{idx_left}, idx_left=#{idx_left}"

Where code is chained it may be helpful to break it up with temporary variables and insert puts statements. 链接代码的情况下,使用临时变量和插入puts语句分解代码可能会有所帮助。 For example, change 例如,改变

arr << line.split(',').each_with_object(g) do |word, h|
  ...

to

a = line.split(',')
puts "line.split(',')=#{a}"
enum = a.each_with_object(g)
puts "enum.to_a=#{enum.to_a}"
arr << enum do |word, h|
  ...

The second puts here is merely to see what elements the enumerator enum will generate and pass to the block. 第二puts这里只是看到枚举什么元素enum会产生并传递到块。

Another way of doing that is to use the handy method Object#tap , which is inserted between two methods: 另一种方法是使用方便的方法Object#tap ,它插入两个方法之间:

arr << line.split(',').tap { |a| puts "line.split(',')=#{a}"}.
            each_with_object(g) do |word, h|
              ...

tap (great name, eh?), as used here, simply returns its receiver after displaying its value. tap (伟大的名字,嗯?),这里使用的,只是在显示其值后返回其接收器。

Lastly, I've used the method Enumerable#each_with_object in a couple of places. 最后,我在几个地方使用了Enumerable#each_with_object方法。 It may seem complex but it's actually quite simple. 它可能看起来很复杂,但实际上很简单。 For example, 例如,

arr << line.split(',').each_with_object(g) do |word, h|
  ...
end

is effectively equivalent to: 实际上相当于:

h = g
arr << line.split(',').each do |word|
  ...
end
h

1 Many IO methods are typically invoked on File . 1通常在File上调用许多IO方法。 This is acceptable because File.superclass #=> IO . 这是可以接受的,因为File.superclass #=> IO

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM