简体   繁体   English

Ruby正则表达式放入哈希数组中,但需要删除键/值对

[英]Ruby regex into array of hashes but need to drop a key/val pair

I'm trying to parse a file containing a name followed by a hierarchy path. 我正在尝试解析包含名称和层次结构路径的文件。 I want to take the named regex matches, turn them into Hash keys, and store the match as a hash. 我想获取命名的正则表达式匹配项,将它们转换为哈希键,然后将匹配项存储为哈希值。 Each hash will get pushed to an array (so I'll end up with an array of hashes after parsing the entire file. This part of the code is working except now I need to handle bad paths with duplicated hierarchy (top_* is always the top level). It appears that if I'm using named backreferences in Ruby I need to name all of the backreferences. I have gotten the match working in Rubular but now I have the p1 backreference in my resultant hash. 每个哈希将被推送到一个数组(因此,在解析整个文件后,我将得到一个哈希数组。这部分代码可以正常工作,除了现在我需要处理具有重复层次结构的错误路径(top_ *始终是似乎,如果我在Ruby中使用命名的反向引用,我需要命名所有的反向引用。我已经在Rubular中使匹配工作了,但是现在我在生成的哈希中有了p1反向引用。

Question: What's the easiest way to not include the p1 key/value pair in the hash? 问题:在哈希中不包含p1键/值对的最简单方法是什么? My method is used in other places so we can't assume that p1 always exists. 我的方法在其他地方使用,所以我们不能假设p1总是存在。 Am I stuck with dropping each key/value pair in the array after calling the s_ary_to_hash method? 在调用s_ary_to_hash方法之后,我是否坚持删除数组中的每个键/值对?

NOTE: I'm keeping this question to try and solve the specific issue of ignoring certain hash keys in my method. 注意:我保留此问题,以尝试解决忽略方法中某些哈希键的特定问题。 The regex issue is now in this ticket: Ruby regex - using optional named backreferences 这张票证中现在出现了正则表达式问题: Ruby regex-使用可选的命名反向引用

UPDATE: Regex issue is solved, the hier is now always stored in the named 'hier' group. 更新:正则表达式问题已解决,现在,层次结构始终存储在名为“ hier”的组中。 The only item remaining is to figure out how to drop the 'p1' key/value if it exists prior to creating the Hash. 剩下的唯一一项是弄清楚如何在创建哈希之前删除“ p1”键/值。

Example file: 示例文件:

name1 top_cat/mouse/dog/top_cat/mouse/dog/elephant/horse
new12 top_ab12/hat[1]/top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool
tops  top_bat/car[0]
ab123 top_2/top_1/top_3/top_4/top_2/top_1/top_3/top_4/dog

Expected output: 预期产量:

[{:name => "name1", :hier => "top_cat/mouse/dog/elephant/horse"},
 {:name => "new12", :hier => "top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool"},
 {:name => "tops",  :hier => "top_bat/car[0]"},
 {:name => "ab123", :hier => "top_2/top_1/top_3/top_4/dog"}]

Code snippet: 程式码片段:

def s_ary_to_hash(ary, regex)
  retary = Array.new
  ary.each {|x| (retary << Hash[regex.match(x).names.map{|key| key.to_sym}.zip(regex.match(x).captures)]) if regex.match(x)}
  return retary
end

regex = %r{(?<name>\w+) (?<p1>[\w\/\[\]]+)?(?<hier>(\k<p1>.*)|((?<= ).*$))}
h_ary = s_ary_to_hash(File.readlines(filename), regex)

What about this regex ? 那这个正则表达式呢?

^(?<name>\S+)\s+(?<p1>top_.+?)(?:\/(?<hier>\k<p1>(?:\[.+?\])?.+))?$

Demo 演示

http://rubular.com/r/awEP9Mz1kB http://rubular.com/r/awEP9Mz1kB

Sample code 样例代码

def s_ary_to_hash(ary, regex, mappings)
   retary = Array.new

   for item in ary
      tmp = regex.match(item)
      if tmp then
         hash = Hash.new
         retary.push(hash)
         mappings.each { |mapping|
            mapping.map { |key, groups|
              for group in group
                 if tmp[group] then
                     hash[key] = tmp[group]
                     break
                 end
              end 
            }
         }
      end
   end

  return retary
end

regex = %r{^(?<name>\S+)\s+(?<p1>top_.+?)(?:\/(?<hier>\k<p1>(?:\[.+?\])?.+))?$}
h_ary = s_ary_to_hash(
   File.readlines(filename), 
   regex,
   [ 
      {:name => ['name']},
      {:hier => ['hier','p1']}
   ]
)

puts h_ary

Output 产量

{:name=>"name1", :hier=>"top_cat/mouse/dog/elephant/horse\r"}
{:name=>"new12", :hier=>"top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool\r"}
{:name=>"tops", :hier=>"top_bat/car[0]"}

Discussion 讨论

Since Ruby 2.0.0 doesn't support branch reset, I have built a solution that add some more power to the s_ary_to_hash function. 由于Ruby 2.0.0不支持分支重置,因此我构建了一个解决方案,为s_ary_to_hash函数添加了更多功能。 It now admits a third parameter indicating how to build the final array of hashes. 现在,它接受第三个参数,该参数指示如何构建最终的哈希数组。

This third parameter is an array of hashes. 第三个参数是哈希数组。 Each hash in this array has one key ( K ) corresponding to the key in the final array of hashes. 此数组中的每个哈希都有一个与哈希的最终数组中的密钥相对应的密钥( K )。 K is associated with an array containing the named group to use from the passed regex (second parameter of s_ary_to_hash function). K与包含要从传递的正则表达式( s_ary_to_hash函数的第二个参数)使用的命名组的数组关联。

If a group equals nil , s_ary_to_hash skips it for the next group. 如果一个组等于nil ,则s_ary_to_hash跳过以进入下一个组。

If all groups equal nil , K is not pushed on the final array of hashes. 如果所有组均等于nil ,则不nil K推入哈希的最终数组中。 Feel free to modify s_ary_to_hash if this isn't a desired behavior. 如果这不是您想要的行为,请随意修改s_ary_to_hash

Edit: I've changed the method s_ary_to_hash to conform with what I now understand to be the criterion for excluding directories, namely, directory d is to be excluded if there is a downstream directory with the same name, or the same name followed by a non-negative integer in brackets. 编辑:我已经更改了方法s_ary_to_hash以符合我现在理解的排除目录的标准,即,如果存在具有相同名称的下游目录,或者具有相同名称的下游目录,则将目录d排除在外括号中为非负整数。 I've applied that to all directories, though I made have misunderstood the question; 我已经将其应用于所有目录,尽管我误解了这个问题。 perhaps it should apply to the first. 也许它应该适用于第一个。

data =<<THE_END
name1 top_cat/mouse/dog/top_cat/mouse/dog/elephant/horse
new12 top_ab12/hat/top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool
tops  top_bat/car[0]
ab123 top_2/top_1/top_3/top_4/top_2/top_1/top_3/top_4/dog
THE_END

text = data.split("\n")

def s_ary_to_hash(ary)
  ary.map do |s| 
    name, _, downstream_path = s.partition(' ').map(&:strip)
    arr = []
    downstream_dirs = downstream_path.split('/')
    downstream_dirs.each {|d| puts "'#{d}'"}
    while downstream_dirs.any? do
      dir = downstream_dirs.shift
      arr << dir unless downstream_dirs.any? { |d|
        d == dir || d =~ /#{dir}\[\d+\]/ }
    end     
    { name: name, hier: arr.join('/') }
  end   
end

s_ary_to_hash(text)
  # => [{:name=>"name1", :hier=>"top_cat/mouse/dog/elephant/horse"},
  #     {:name=>"new12", :hier=>"top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool"},
  #     {:name=>"tops", :hier=>"top_bat/car[0]"},
  #     {:name=>"ab123", :hier=>"top_2/top_1/top_3/top_4/dog"}] 

The exclusion criterion is implement in downstream_dirs.any? { |d| d == dir || d =~ /#{dir}\\[\\d+\\]/ } 排除标准是在downstream_dirs.any? { |d| d == dir || d =~ /#{dir}\\[\\d+\\]/ }实现的downstream_dirs.any? { |d| d == dir || d =~ /#{dir}\\[\\d+\\]/ } downstream_dirs.any? { |d| d == dir || d =~ /#{dir}\\[\\d+\\]/ } downstream_dirs.any? { |d| d == dir || d =~ /#{dir}\\[\\d+\\]/ } , where dir is the directory that is being tested and downstream_dirs is an array of all the downstream directories. downstream_dirs.any? { |d| d == dir || d =~ /#{dir}\\[\\d+\\]/ }其中dir是被测试和目录downstream_dirs是所有下游目录的数组。 (When dir is the last directory, downstream_dirs is empty.) Localizing it in this way makes it easy to test and change the exclusion criterion. (如果dir是最后一个目录,则downstream_dirs目录为空。)以这种方式对其进行本地化可以轻松测试和更改排除标准。 You could shorten this to a single regex and/or make it a method: 您可以将其缩短为单个正则表达式和/或使其成为方法:

dir exclude_dir?(dir, downstream_dirs)
  downstream_dirs.any? { |d| d == dir || d =~ /#{dir}\[\d+\]/ }end
end

Here is a non regexp solution: 这是一个非正则表达式解决方案:

result = string.each_line.map do |line|
  name, path = line.split(' ')
  path = path.split('/')
  last_occur_of_root = path.rindex(path.first)
  path = path[last_occur_of_root..-1]
  {name: name, heir: path.join('/')}
end

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM