简体   繁体   English

红宝石中的多行正则表达式

[英]Multiple line regex in ruby

I am trying to strip some repeated text out of my Kindle clippings that look like this: 我试图从我的Kindle剪辑中删除一些重复的文本,如下所示:

 The starting point,obviously,is a thorough analysis ofthe intellectual property portfolio,the contents ofwhich can be broadly divided into two categories:property that is in use and property that is not in use
 ==========
 Essentials of Licensing Intellectual Property (Alexander I. Poltorak, Paul J. Lerner)
 - Highlight on Page 25 | Added on Friday, 25 November 11 10:53:36 Greenwich Mean Time

 commentators (a euphemism for prolific writers with little experience
 ==========
 Essentials of Licensing Intellectual Property (Alexander I. Poltorak, Paul J. Lerner)
 - Highlight on Page 26 | Added on Friday, 25 November 11 10:54:29 Greenwich Mean Time

I am trying to strip out everthing between "Essentials" and "Time". 我试图消除“Essentials”和“Time”之间的关系。 The regexp I am playing with right now looks like this: 我正在玩的regexp现在看起来像这样:

Essentials([^,]+)Time

But obviously it is not working: 但显然它不起作用:

http://rubular.com/r/gwSJFgOQai http://rubular.com/r/gwSJFgOQai

Any help for this nuby would be massively appreciated! 任何帮助这个nuby将非常感谢!

You need the /m modifier which makes . 你需要/ m修饰符. match a newline: 匹配换行符:

/Essentials(.*?)Time/m

See it working here: http://rubular.com/r/qgmkWnLzW6 看到它在这里工作: http//rubular.com/r/qgmkWnLzW6

Why don't you use this: 你为什么不用这个:

/Essentials(.*?)Time/m

Updated. 更新。 Forgot the m for multiline. 忘了多线的m。

Regex are powerful, but you'll find they also often add needless complexity to a problem. 正则表达式是强大的,但你会发现它们也经常为问题增加不必要的复杂性。

This is how I'd go about the problem: 这就是我如何处理这个问题:

text = <<EOT
The starting point,obviously,is a thorough analysis ofthe intellectual property portfolio,the contents ofwhich can be broadly divided into two categories:property that is in use and property that is not in use
==========
Essentials of Licensing Intellectual Property (Alexander I. Poltorak, Paul J. Lerner)
- Highlight on Page 25 | Added on Friday, 25 November 11 10:53:36 Greenwich Mean Time

commentators (a euphemism for prolific writers with little experience
==========
Essentials of Licensing Intellectual Property (Alexander I. Poltorak, Paul J. Lerner)
- Highlight on Page 26 | Added on Friday, 25 November 11 10:54:29 Greenwich Mean Time
EOT

text.each_line do |l|
  l.chomp!
  next if ((l =~ /\AEssentials/) .. (l =~ /Time\z/))

  puts l
end

Which outputs: 哪个输出:

The starting point,obviously,is a thorough analysis ofthe intellectual property portfolio,the contents ofwhich can be broadly divided into two categories:property that is in use and property that is not in use
==========

commentators (a euphemism for prolific writers with little experience
==========

This works because the .. , AKA range operator, gains new capability when used with an if , and turns into what we call the flip-flop operator. 这是因为.. ,AKA范围运算符,当与if一起使用时获得新功能,并变成我们称之为触发器运算符的功能。 In operation what happens is ((l =~ /\\AEssentials/) .. (l =~ /Time\\z/)) returns false, until (l =~ /\\AEssentials/) matches. 在操作中发生的是((l =~ /\\AEssentials/) .. (l =~ /Time\\z/))返回false,直到(l =~ /\\AEssentials/)匹配。 From then until (l =~ /Time\\z/) matches it returns true. 从那时起(l =~ /Time\\z/)匹配,它返回true。 Once the final regex matches it returns to returning false. 一旦最终正则表达式匹配,它将返回false。

This behavior works really well for extracting sections from text. 此行为非常适用于从文本中提取节。

If you are aggregating text, for subsequent output, replace the puts l with something to append l to a buffer, then output that buffer at the end of your run. 如果您汇总文本,为后续输出,更换puts l的东西追加l ,在你运行结束缓冲到一个缓冲区,然后输出。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM