简体   繁体   English

从Ruby中包含“ [”的字符串中提取子字符串

[英]Extract substring from string in Ruby containing “[”

I have a file containing data like this: 我有一个包含如下数据的文件:

[date,ip]:string{[0.892838,1.28820,8.828823]}

and I want to extract the data 0.892838, 1.28820, 8.828823 to a string for a later processing. 我想将数据0.892838、1.282820、8.882823提取到字符串中,以供以后处理。

I have used the pattern line = String ~= /\\\\[/ to get the position where the "[" occurs but for the above input, I get this error message: 我已经使用了模式line = String ~= /\\\\[/来获取"["出现的位置,但是对于上面的输入,我得到了以下错误消息:

premature end of char-class /\\[/

How is this ? 这怎么样 ?

str = 'date,ip]:string{[0.892838,1.28820,8.828823]}'
str.scan(/\d+.\d+/)
# => ["0.892838", "1.28820", "8.828823"]

Using capturing group: 使用捕获组:

'[date,ip]:string{[0.892838,1.28820,8.828823]}' =~ /{\[(.*?)\]}/
# => 16
$1            # => "0.892838,1.28820,8.828823"
$1.split(',') # => ["0.892838", "1.28820", "8.828823"]

As I'm prone to do: 因为我很容易做:

require 'fruity'

str = '[date,ip]:' + ('string' * 1) + '{[0.892838,1.28820,8.828823]}'
compare do
  arup { str.scan(/\d+.\d+/) }
  falsetrue { str =~ /{\[(.*?)\]}/; $1.split(',') }
  ttm { str[/\[([^\]]+)\]}$/, 1].split(',') }
end

# >> Running each test 2048 times. Test will take about 1 second.
# >> falsetrue is similar to ttm
# >> ttm is faster than arup by 2x ± 0.1

The longer the string section is, the more the various attempts vary in their run times: string部分越长,各种尝试的运行时间变化就越大:

require 'fruity'

str = '[date,ip]:' + ('string' * 1000) + '{[0.892838,1.28820,8.828823]}'
compare do
  arup { str.scan(/\d+.\d+/) }
  falsetrue { str =~ /{\[(.*?)\]}/; $1.split(',') }
  ttm { str[/\[([^\]]+)\]}$/, 1].split(',') }
end

# >> Running each test 512 times. Test will take about 2 seconds.
# >> ttm is faster than falsetrue by 60.00000000000001% ± 10.0%
# >> falsetrue is faster than arup by 13x ± 1.0

The reason the "ttm" result improves in speed is because of '$' . “ ttm”结果速度提高的原因是由于'$' That anchor gives the regular expression engine the information it needs to know where to search immediately. 该锚为正则表达式引擎提供了它需要知道立即搜索到的位置所需的信息。 Without it it'd start at the beginning of the string and search forward so the longer the 'string' component, the more time it takes to find the desired pattern. 没有它,它将从字符串的开头开始并向前搜索,因此'string'部分越长,找到所需模式所需的时间就越多。

Experiment with the expressions using benchmarks and you can find the best average speed and expression for a particular task. 使用基准测试表达式,您可以找到特定任务的最佳平均速度和表达式。

If the "string" section is always short, the difference for a single pass is so small it won't really matter and then it's sensible to use the most easily read (and easily maintained) code, which would be str.scan(/\\d+.\\d+/) . 如果“字符串”部分总是很短,则一次传递的差异是如此之小,以至于实际上并不重要,那么明智的是使用最容易读取(和易于维护)的代码,即str.scan(/\\d+.\\d+/) If the code is in a loop and being run millions of times, then it starts to make a difference and one of the others might be more sensible. 如果代码处于循环中并运行了数百万次,那么它将开始有所作为,而其中之一可能更明智。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM