I'm trying to parse a file containing a name followed by a hierarchy path. I want to take the named regex matches, turn them into Hash keys, and store the match as a hash. Each hash will get pushed to an array (so I'll end up with an array of hashes after parsing the entire file. This part of the code is working except now I need to handle bad paths with duplicated hierarchy (top_* is always the top level). It appears that if I'm using named backreferences in Ruby I need to name all of the backreferences. I have gotten the match working in Rubular but now I have the p1
backreference in my resultant hash.
Question: What's the easiest way to not include the p1
key/value pair in the hash? My method is used in other places so we can't assume that p1
always exists. Am I stuck with dropping each key/value pair in the array after calling the s_ary_to_hash method?
NOTE: I'm keeping this question to try and solve the specific issue of ignoring certain hash keys in my method. The regex issue is now in this ticket: Ruby regex - using optional named backreferences
UPDATE: Regex issue is solved, the hier is now always stored in the named 'hier' group. The only item remaining is to figure out how to drop the 'p1' key/value if it exists prior to creating the Hash.
Example file:
name1 top_cat/mouse/dog/top_cat/mouse/dog/elephant/horse
new12 top_ab12/hat[1]/top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool
tops top_bat/car[0]
ab123 top_2/top_1/top_3/top_4/top_2/top_1/top_3/top_4/dog
Expected output:
[{:name => "name1", :hier => "top_cat/mouse/dog/elephant/horse"},
{:name => "new12", :hier => "top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool"},
{:name => "tops", :hier => "top_bat/car[0]"},
{:name => "ab123", :hier => "top_2/top_1/top_3/top_4/dog"}]
Code snippet:
def s_ary_to_hash(ary, regex)
retary = Array.new
ary.each {|x| (retary << Hash[regex.match(x).names.map{|key| key.to_sym}.zip(regex.match(x).captures)]) if regex.match(x)}
return retary
end
regex = %r{(?<name>\w+) (?<p1>[\w\/\[\]]+)?(?<hier>(\k<p1>.*)|((?<= ).*$))}
h_ary = s_ary_to_hash(File.readlines(filename), regex)
What about this regex ?
^(?<name>\S+)\s+(?<p1>top_.+?)(?:\/(?<hier>\k<p1>(?:\[.+?\])?.+))?$
http://rubular.com/r/awEP9Mz1kB
def s_ary_to_hash(ary, regex, mappings)
retary = Array.new
for item in ary
tmp = regex.match(item)
if tmp then
hash = Hash.new
retary.push(hash)
mappings.each { |mapping|
mapping.map { |key, groups|
for group in group
if tmp[group] then
hash[key] = tmp[group]
break
end
end
}
}
end
end
return retary
end
regex = %r{^(?<name>\S+)\s+(?<p1>top_.+?)(?:\/(?<hier>\k<p1>(?:\[.+?\])?.+))?$}
h_ary = s_ary_to_hash(
File.readlines(filename),
regex,
[
{:name => ['name']},
{:hier => ['hier','p1']}
]
)
puts h_ary
{:name=>"name1", :hier=>"top_cat/mouse/dog/elephant/horse\r"}
{:name=>"new12", :hier=>"top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool\r"}
{:name=>"tops", :hier=>"top_bat/car[0]"}
Since Ruby 2.0.0 doesn't support branch reset, I have built a solution that add some more power to the s_ary_to_hash
function. It now admits a third parameter indicating how to build the final array of hashes.
This third parameter is an array of hashes. Each hash in this array has one key ( K
) corresponding to the key in the final array of hashes. K
is associated with an array containing the named group to use from the passed regex (second parameter of s_ary_to_hash
function).
If a group equals nil
, s_ary_to_hash
skips it for the next group.
If all groups equal nil
, K
is not pushed on the final array of hashes. Feel free to modify s_ary_to_hash
if this isn't a desired behavior.
Edit: I've changed the method s_ary_to_hash
to conform with what I now understand to be the criterion for excluding directories, namely, directory d
is to be excluded if there is a downstream directory with the same name, or the same name followed by a non-negative integer in brackets. I've applied that to all directories, though I made have misunderstood the question; perhaps it should apply to the first.
data =<<THE_END
name1 top_cat/mouse/dog/top_cat/mouse/dog/elephant/horse
new12 top_ab12/hat/top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool
tops top_bat/car[0]
ab123 top_2/top_1/top_3/top_4/top_2/top_1/top_3/top_4/dog
THE_END
text = data.split("\n")
def s_ary_to_hash(ary)
ary.map do |s|
name, _, downstream_path = s.partition(' ').map(&:strip)
arr = []
downstream_dirs = downstream_path.split('/')
downstream_dirs.each {|d| puts "'#{d}'"}
while downstream_dirs.any? do
dir = downstream_dirs.shift
arr << dir unless downstream_dirs.any? { |d|
d == dir || d =~ /#{dir}\[\d+\]/ }
end
{ name: name, hier: arr.join('/') }
end
end
s_ary_to_hash(text)
# => [{:name=>"name1", :hier=>"top_cat/mouse/dog/elephant/horse"},
# {:name=>"new12", :hier=>"top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool"},
# {:name=>"tops", :hier=>"top_bat/car[0]"},
# {:name=>"ab123", :hier=>"top_2/top_1/top_3/top_4/dog"}]
The exclusion criterion is implement in downstream_dirs.any? { |d| d == dir || d =~ /#{dir}\\[\\d+\\]/ }
downstream_dirs.any? { |d| d == dir || d =~ /#{dir}\\[\\d+\\]/ }
downstream_dirs.any? { |d| d == dir || d =~ /#{dir}\\[\\d+\\]/ }
, where dir
is the directory that is being tested and downstream_dirs
is an array of all the downstream directories. (When dir
is the last directory, downstream_dirs
is empty.) Localizing it in this way makes it easy to test and change the exclusion criterion. You could shorten this to a single regex and/or make it a method:
dir exclude_dir?(dir, downstream_dirs)
downstream_dirs.any? { |d| d == dir || d =~ /#{dir}\[\d+\]/ }end
end
Here is a non regexp solution:
result = string.each_line.map do |line|
name, path = line.split(' ')
path = path.split('/')
last_occur_of_root = path.rindex(path.first)
path = path[last_occur_of_root..-1]
{name: name, heir: path.join('/')}
end
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.