简体   繁体   中英

Using Regex and ruby regular expressions to find values

So I'm currently trying to sort values from a file. I'm stuck on the finding the first attribute, and am not sure why. I'm new to regex and ruby so I'm not sure how to go about the problem. I'm trying to find values of a,b,c,d,e where they are all positive numbers.

Here's what the line will look like

length=<a> begin=(<b>,<c>) end=(<d>,<e>)

Here's what I'm using to find the values

current_line = file.gets
if current_line == nil then return end
while current_line = file.gets do
   if line =~ /length=<(\d+)> begin=((\d+),(\d+)) end=((\d+),(\d+))/
       length, begin_x, begin_y, end_x, end_y = $1, $2, $3, $4, $5
       puts("length:" + length.to_s + " begin:" + begin_x.to_s + "," + begin_y.to_s + " end:" + end_x.to_s + "," + end_y.to_s)
   end
end

for some reason it never prints anything out, so I'm assuming it never finds a match

Sample input length=4 begin=(0,0) end=(3,0)


A line with 0-4 decimals after 2 integers seperated by commas. So it could be any of these:

2 4 1.3434324,3.543243,4.525324   
1 2     
18 3.3213,9.3233,1.12231,2.5435    
7 9 2.2,1.899990    
0 3 2.323    

Here is your regex:

r = /length=<(\d+)> begin=((\d+),(\d+)) end=((\d+),(\d+))/
str.scan(r)
  #=> nil

First, we need to escape the parenthesis:

r = /length=<(\d+)> begin=\((\d+),(\d+)\) end=\((\d+),(\d+)\)/

Next, add the missing < and > after "begin" and "end" .

r = /length=<(\d+)> begin=\(<(\d+)>,<(\d+)>\) end=\(<(\d+)>,<(\d+)>\)/

Now let's try it:

str = "length=<4779> begin=(<21>,<47>) end=(<356>,<17>)" 

but first, let's set the mood

str.scan(r)
  #=> [["4779", "21", "47", "356", "17"]]

Success!

Lastly (though probably not necessary), we might replace the single spaces with \\s+ , which permits one or more spaces:

r = /length=<(\d+)>\s+begin=\(<(\d+)>,<(\d+)>\)\send=\(<(\d+)>,<(\d+)>\)/

Addendum

The OP has asked how this would be modified if some of the numeric values were floats. I do not understand precisely what has been requested, but the following could be modified as required. I've assumed all the numbers are non-negative. I've also illustrated one way to "build" a regex, using Regexp#new .

  s1 = '<(\d+(?:\.\d+)?)>' # note single parens
    #=> "<(\\d+(?:\\.\\d+)?)>" 
  s2 = "=\\(#{s1},#{s1}\\)"
    #=> "=\\(<(\\d+(?:\\.\\d+)?)>,<(\\d+(?:\\.\\d+)?)>\\)" 
  r = Regexp.new("length=#{s1} begin#{s2} end#{s2}")
    #=> /length=<(\d+(?:\.\d+)?)> begin=\(<(\d+(?:\.\d+)?)>,<(\d+(?:\.\d+)?)>\) end=\(<(\d+(?:\.\d+)?)>,<(\d+(?:\.\d+)?)>\)/ 

  str = "length=<47.79> begin=(<21>,<4.7>) end=(<0.356>,<17.999>)" 

  str.scan(r)
    #=> [["47.79", "21", "4.7", "0.356", "17.999"]] 

Sample input:

 length=4 begin=(0,0) end=(3,0) 

data.txt:

length=3 begin=(0,0) end=(3,0)
length=4 begin=(0,1) end=(0,5)
length=2 begin=(1,3) end=(1,5)

Try this:

require 'pp'

Line = Struct.new(
  :length, 
  :begin_x,
  :begin_y,
  :end_x,
  :end_y,
)

lines = []

IO.foreach('data.txt') do |line|
  numbers = []

  line.scan(/\d+/) do |match|
    numbers << match.to_i
  end

  lines << Line.new(*numbers)
end

pp lines

puts lines[-1].begin_x

--output:--
[#<struct Line length=3, begin_x=0, begin_y=0, end_x=3, end_y=0>,
 #<struct Line length=4, begin_x=0, begin_y=1, end_x=0, end_y=5>,
 #<struct Line length=2, begin_x=1, begin_y=3, end_x=1, end_y=5>]
1

With this data.txt:

2 4 1.3434324,3.543243,4.525324   
1 2     
18 3.3213,9.3233,1.12231,2.5435    
7 9 2.2,1.899990    
0 3 2.323    

Try this:

require 'pp'

data = []

IO.foreach('data.txt') do |line|
  pieces = line.split
  csv_numbers = pieces[-1]

  next if not csv_numbers.index('.') #skip the case where there are no floats on a line

  floats = csv_numbers.split(',')
  data << floats.map(&:to_f)
end

pp data

--output:--
[[1.3434324, 3.543243, 4.525324],
 [3.3213, 9.3233, 1.12231, 2.5435],
 [2.2, 1.89999],
 [2.323]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM