简体   繁体   English

此RegEx有什么问题?

[英]What's wrong with this RegEx?

I'm trying to implement this in a small ruby script, and tested it on http://www.rubular.com/ , where it worked perfectly. 我正在尝试在一个小的ruby脚本中实现它,并在http://www.rubular.com/上对其进行了测试,在此脚本可以很好地工作。 Not sure why its not performing in the actual script. 不知道为什么它不能在实际脚本中执行。

The RegEx: /(motion|links|sound|button|symbol)|(0.\\d{8})|(\\s\\d{1}\\s)|(\\d{10}\\s)/ RegEx:/(运动|链接|声音|按钮|符号)|(0. \\ d {8})|(\\ s \\ d {1} \\ s)|(\\ d {10} \\ s)/

The Text it's Against: 反对的文本:

Trial ID: 1 | 试用ID:1 | Trial Type: motion | 试用类型:运动| Trick? 特技? 1 Click Time: 0.87913100 1302969732 1点击时间:0.87913100 1302969732

Trial ID: 7 | 试用ID:7 | Trial Type: button | 试用类型:按钮| Trick? 特技? 0 Click Time: 0.19817800 1302987043 0点击时间:0.19817800 1302987043

etc. etc. 等等等

What I am trying to grab: Only the numbers, and the single word after "Trial Type". 我要抓住的是:仅数字和“试用类型”后的单个单词。 So for the first line of the example, I would only want " 1 motion 1 0.87913100 1302969732" to be returned. 因此,对于示例的第一行,我只希望返回“ 1个运动1 0.87913100 1302969732”。 I also want to keep the space before the first number in each trial. 我也想在每个试验的第一个数字前保留空格。

My short ruby script : 我的红宝石短脚本

File.open('log.txt', 'r') do |file|
  contents = file.readlines.to_s
  regex = Regexp.new(/(motion|links|sound|button|symbol)|(0\.\d{8})|(\s\d{1}\s)|(\d{10}\s)/)
  matchdata = regex.match(contents).to_a
  matchdata.each do |match|
    if match != nil
      puts match
    end
  end
end

It only outputs two "1"s though. 但是,它仅输出两个“ 1”。 Hmm... I know its reading the file contents right, and when I tried an alternate simplet regex it worked fine. 嗯...我知道它可以正确读取文件内容,当我尝试其他简单正则表达式时,它可以正常工作。

Thanks for any help I get here!! 感谢您的帮助! : ) :)

You want to use String#scan 您想使用String#scan

 matchdata = contents.scan(regex)

Also @Mike Penington is correct, you shouldn't have to do the if match != nil if you do it right. @Mike Penington也是正确的,如果正确执行, if match != nil不必执行if match != nil You have to clean up your regex as well. 您还必须清理正则表达式。 The pipe character in regex is a special character to denote match the left side OR the right side, and you have the litteral pipe character that you must escape. 正则表达式中的管道字符是一个特殊字符,用于表示左侧或右侧的匹配,并且您具有必须要转义的乱抛垃圾字符。

You need to escape the literal pipes inside the regex, fill in other missing literals (like Trick, \\?, Click\\sTime:, remove some of the spaces, etc...), and insert regex spaces where appropriate... ie 您需要转义正则表达式内的文字管道,填写其他缺少的文字(例如Trick,\\ ?、 Click \\ sTime :、删除一些空格等),并在适当的地方插入正则表达式空格……即

regex = Regexp.new(/(motion|links|sound|button|symbol)\\s\\|\\sTrick\\?\\s*\\d\\s*Click\\s+Time:\\s+(0\\.\\d{,8})\\s(\\d{10}))/)

EDIT: fixed parenthesis nesting in the original 编辑:固定括号嵌套在原始

If you know that the data follows a particular pattern, you can just follow that pattern in the regex, and pick up the portions you want with ( ) . 如果您知道数据遵循特定模式,则只需在正则表达式中遵循该模式,然后使用( )提取所需的部分。

/Trial ID: (\d+) \| Trial Type: (\w+) \| Trick\? (\d+) Click Time: ([\.\d]+) ([\.\d]+)/

The more you know previously about the data, the more specifically you can make the regex. 您以前对数据了解得越多,则制作正则表达式就越具体。 If you see some variations in the data, and the regex fails to match, then just relax the pattern: 如果您看到数据有些变化,而正则表达式无法匹配,则只需放松模式:

  • If the Trail ID, Trail ID may include a decimal point, use [\\.\\d]+ instead of \\d+ . 如果Trail ID,Trail ID可能包含小数点,请使用[\\.\\d]+代替\\d+
  • If the space can be more than one, then replace it with []+ 如果空格可以大于一个,则用[]+替换
  • If the space can be a tab, or can be absent, use \\s* or [ \\t]* . 如果空格可以是制表符,也可以不存在,请使用\\s*[ \\t]*
  • If the Trial ID: part may appear as a different phrase, replace it with .*? 如果Trial ID: part可能以不同的短语出现,请用.*?代替.*? ,

and so on. 等等。

If you are not sure how many spaces/tabs appear, use this: 如果不确定要显示多少空格/制表符,请使用以下命令:

/Trial\s*ID:\s*(\d+)\s*\|\s*Trial\s*Type:\s*(\w+)\s*\|\s*Trick\?\s*(\d+)\s*Click\s*Time:\s*([\.\d]+)\s+([\.\d]+)/

This is one of those times that trying to everything in a big regex makes you work too hard. 这是一次尝试在大型正则表达式中进行所有操作而使您工作过度的情况之一。 Simplify things: 简化的事情:

ary = [
  'Trial ID: 1 | Trial Type: motion | Trick? 1 Click Time: 0.87913100 1302969732',
  'Trial ID: 7 | Trial Type: button | Trick? 0 Click Time: 0.19817800 1302987043'
]

ary.each do |li|
  numbers = li.scan(/[\d.]+/)
  trial_type = li[/Trial Type: (\w+)/, 1]

  puts "%d %s %d %f %d\n" % [numbers.first, trial_type, *numbers[1 .. -1]]
end
# >> 1 motion 1 0.879131 1302969732
# >> 7 button 0 0.198178 1302987043

Regex patterns are powerful, but people think it's macho to do everything in one big line. 正则表达式模式很强大,但是人们认为大胆地做所有事情是大男子主义。 You have to weigh doing that with the increased work necessary to put together the regex in the first place, plus maintain it if something changes in the text being parsed later. 您必须权衡这点,首先要进行更多工作以将正则表达式放在一起,如果以后要解析的文本发生某些更改,则还要对其进行维护。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM