[英]Ruby regex into array of hashes but need to drop a key/val pair
I'm trying to parse a file containing a name followed by a hierarchy path. 我正在尝试解析包含名称和层次结构路径的文件。 I want to take the named regex matches, turn them into Hash keys, and store the match as a hash.
我想获取命名的正则表达式匹配项,将它们转换为哈希键,然后将匹配项存储为哈希值。 Each hash will get pushed to an array (so I'll end up with an array of hashes after parsing the entire file. This part of the code is working except now I need to handle bad paths with duplicated hierarchy (top_* is always the top level). It appears that if I'm using named backreferences in Ruby I need to name all of the backreferences. I have gotten the match working in Rubular but now I have the
p1
backreference in my resultant hash. 每个哈希将被推送到一个数组(因此,在解析整个文件后,我将得到一个哈希数组。这部分代码可以正常工作,除了现在我需要处理具有重复层次结构的错误路径(top_ *始终是似乎,如果我在Ruby中使用命名的反向引用,我需要命名所有的反向引用。我已经在Rubular中使匹配工作了,但是现在我在生成的哈希中有了
p1
反向引用。
Question: What's the easiest way to not include the p1
key/value pair in the hash? 问题:在哈希中不包含
p1
键/值对的最简单方法是什么? My method is used in other places so we can't assume that p1
always exists. 我的方法在其他地方使用,所以我们不能假设
p1
总是存在。 Am I stuck with dropping each key/value pair in the array after calling the s_ary_to_hash method? 在调用s_ary_to_hash方法之后,我是否坚持删除数组中的每个键/值对?
NOTE: I'm keeping this question to try and solve the specific issue of ignoring certain hash keys in my method. 注意:我保留此问题,以尝试解决忽略方法中某些哈希键的特定问题。 The regex issue is now in this ticket: Ruby regex - using optional named backreferences
这张票证中现在出现了正则表达式问题: Ruby regex-使用可选的命名反向引用
UPDATE: Regex issue is solved, the hier is now always stored in the named 'hier' group. 更新:正则表达式问题已解决,现在,层次结构始终存储在名为“ hier”的组中。 The only item remaining is to figure out how to drop the 'p1' key/value if it exists prior to creating the Hash.
剩下的唯一一项是弄清楚如何在创建哈希之前删除“ p1”键/值。
Example file: 示例文件:
name1 top_cat/mouse/dog/top_cat/mouse/dog/elephant/horse
new12 top_ab12/hat[1]/top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool
tops top_bat/car[0]
ab123 top_2/top_1/top_3/top_4/top_2/top_1/top_3/top_4/dog
Expected output: 预期产量:
[{:name => "name1", :hier => "top_cat/mouse/dog/elephant/horse"},
{:name => "new12", :hier => "top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool"},
{:name => "tops", :hier => "top_bat/car[0]"},
{:name => "ab123", :hier => "top_2/top_1/top_3/top_4/dog"}]
Code snippet: 程式码片段:
def s_ary_to_hash(ary, regex)
retary = Array.new
ary.each {|x| (retary << Hash[regex.match(x).names.map{|key| key.to_sym}.zip(regex.match(x).captures)]) if regex.match(x)}
return retary
end
regex = %r{(?<name>\w+) (?<p1>[\w\/\[\]]+)?(?<hier>(\k<p1>.*)|((?<= ).*$))}
h_ary = s_ary_to_hash(File.readlines(filename), regex)
What about this regex ? 那这个正则表达式呢?
^(?<name>\S+)\s+(?<p1>top_.+?)(?:\/(?<hier>\k<p1>(?:\[.+?\])?.+))?$
http://rubular.com/r/awEP9Mz1kB http://rubular.com/r/awEP9Mz1kB
def s_ary_to_hash(ary, regex, mappings)
retary = Array.new
for item in ary
tmp = regex.match(item)
if tmp then
hash = Hash.new
retary.push(hash)
mappings.each { |mapping|
mapping.map { |key, groups|
for group in group
if tmp[group] then
hash[key] = tmp[group]
break
end
end
}
}
end
end
return retary
end
regex = %r{^(?<name>\S+)\s+(?<p1>top_.+?)(?:\/(?<hier>\k<p1>(?:\[.+?\])?.+))?$}
h_ary = s_ary_to_hash(
File.readlines(filename),
regex,
[
{:name => ['name']},
{:hier => ['hier','p1']}
]
)
puts h_ary
{:name=>"name1", :hier=>"top_cat/mouse/dog/elephant/horse\r"}
{:name=>"new12", :hier=>"top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool\r"}
{:name=>"tops", :hier=>"top_bat/car[0]"}
Since Ruby 2.0.0 doesn't support branch reset, I have built a solution that add some more power to the s_ary_to_hash
function. 由于Ruby 2.0.0不支持分支重置,因此我构建了一个解决方案,为
s_ary_to_hash
函数添加了更多功能。 It now admits a third parameter indicating how to build the final array of hashes. 现在,它接受第三个参数,该参数指示如何构建最终的哈希数组。
This third parameter is an array of hashes. 第三个参数是哈希数组。 Each hash in this array has one key (
K
) corresponding to the key in the final array of hashes. 此数组中的每个哈希都有一个与哈希的最终数组中的密钥相对应的密钥(
K
)。 K
is associated with an array containing the named group to use from the passed regex (second parameter of s_ary_to_hash
function). K
与包含要从传递的正则表达式( s_ary_to_hash
函数的第二个参数)使用的命名组的数组关联。
If a group equals nil
, s_ary_to_hash
skips it for the next group. 如果一个组等于
nil
,则s_ary_to_hash
跳过以进入下一个组。
If all groups equal nil
, K
is not pushed on the final array of hashes. 如果所有组均等于
nil
,则不nil
K
推入哈希的最终数组中。 Feel free to modify s_ary_to_hash
if this isn't a desired behavior. 如果这不是您想要的行为,请随意修改
s_ary_to_hash
。
Edit: I've changed the method s_ary_to_hash
to conform with what I now understand to be the criterion for excluding directories, namely, directory d
is to be excluded if there is a downstream directory with the same name, or the same name followed by a non-negative integer in brackets. 编辑:我已经更改了方法
s_ary_to_hash
以符合我现在理解的排除目录的标准,即,如果存在具有相同名称的下游目录,或者具有相同名称的下游目录,则将目录d
排除在外括号中为非负整数。 I've applied that to all directories, though I made have misunderstood the question; 我已经将其应用于所有目录,尽管我误解了这个问题。 perhaps it should apply to the first.
也许它应该适用于第一个。
data =<<THE_END
name1 top_cat/mouse/dog/top_cat/mouse/dog/elephant/horse
new12 top_ab12/hat/top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool
tops top_bat/car[0]
ab123 top_2/top_1/top_3/top_4/top_2/top_1/top_3/top_4/dog
THE_END
text = data.split("\n")
def s_ary_to_hash(ary)
ary.map do |s|
name, _, downstream_path = s.partition(' ').map(&:strip)
arr = []
downstream_dirs = downstream_path.split('/')
downstream_dirs.each {|d| puts "'#{d}'"}
while downstream_dirs.any? do
dir = downstream_dirs.shift
arr << dir unless downstream_dirs.any? { |d|
d == dir || d =~ /#{dir}\[\d+\]/ }
end
{ name: name, hier: arr.join('/') }
end
end
s_ary_to_hash(text)
# => [{:name=>"name1", :hier=>"top_cat/mouse/dog/elephant/horse"},
# {:name=>"new12", :hier=>"top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool"},
# {:name=>"tops", :hier=>"top_bat/car[0]"},
# {:name=>"ab123", :hier=>"top_2/top_1/top_3/top_4/dog"}]
The exclusion criterion is implement in downstream_dirs.any? { |d| d == dir || d =~ /#{dir}\\[\\d+\\]/ }
排除标准是在
downstream_dirs.any? { |d| d == dir || d =~ /#{dir}\\[\\d+\\]/ }
实现的downstream_dirs.any? { |d| d == dir || d =~ /#{dir}\\[\\d+\\]/ }
downstream_dirs.any? { |d| d == dir || d =~ /#{dir}\\[\\d+\\]/ }
downstream_dirs.any? { |d| d == dir || d =~ /#{dir}\\[\\d+\\]/ }
, where dir
is the directory that is being tested and downstream_dirs
is an array of all the downstream directories. downstream_dirs.any? { |d| d == dir || d =~ /#{dir}\\[\\d+\\]/ }
其中dir
是被测试和目录downstream_dirs
是所有下游目录的数组。 (When dir
is the last directory, downstream_dirs
is empty.) Localizing it in this way makes it easy to test and change the exclusion criterion. (如果
dir
是最后一个目录,则downstream_dirs
目录为空。)以这种方式对其进行本地化可以轻松测试和更改排除标准。 You could shorten this to a single regex and/or make it a method: 您可以将其缩短为单个正则表达式和/或使其成为方法:
dir exclude_dir?(dir, downstream_dirs)
downstream_dirs.any? { |d| d == dir || d =~ /#{dir}\[\d+\]/ }end
end
Here is a non regexp solution: 这是一个非正则表达式解决方案:
result = string.each_line.map do |line|
name, path = line.split(' ')
path = path.split('/')
last_occur_of_root = path.rindex(path.first)
path = path[last_occur_of_root..-1]
{name: name, heir: path.join('/')}
end
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.