Ruby Regex匹配字符串的多个部分

Question

Using Ruby: ruby 1.9.3dev (2011-09-23 revision 33323) [i686-linux] 使用Ruby：ruby 1.9.3dev（2011-09-23修订版33323）[i686-linux]

I have the following string: 我有以下字符串：

str = 'Message relates to activity <a href="/activities/35">TU4 Sep 5 Activity 1</a> <img src="/images/layout/placeholder.png" width="222" height="149"/><br/><br/>First question from Manager on TU4 Sep 5 Activity 1.'

I want to match the following: 我要符合以下条件：

35 (a number which is part of href attribute value) 35（数字是href属性值的一部分）
TU4 Sep 5 Activity (the text for tag) TU4 Sep 5活动（标签的文本）
First question from Manager on TU4 Sep 5 Activity 1. (the remaining text after last <br/><br/> tags) 经理关于TU4 Sep 5活动1的第一个问题。（最后一个<br/><br/>标记之后的其余文本）

For achieving the same I have written the following regex 为了达到相同的目的，我编写了以下正则表达式

result = str.match(/<a href="\/activities\/(?<activity_id>\d+)">(?<activity_title>.*)<\/a>.*<br\/><br\/>(?<message>.*)/)

This produces following result: 这将产生以下结果：

#<MatchData "<a href=\"/activities/35\">TU4 Sep 5 Activity 1</a> <img src=\"/images/layout/placeholder.png\" width=\"222\" height=\"149\"/><br/><br/>First question from Manager on TU4 Sep 5 Activity 1." 
         activity_id:"35" 
         activity_title:"TU4 Sep 5 Activity 1" 
         message:"First question from Manager on TU4 Sep 5 Activity 1.">

But I guess this is not efficient. 但是我想这不是很有效。 Is it possible that somehow only the required values(as mentioned above under what I want to match) is returned in the matched result and the following value gets excluded from matched result: 是否有可能在匹配结果中仅以某种方式返回了所需的值（如上所述，我要匹配的内容），并且以下值被从匹配结果中排除：

"<a href=\"/activities/35\">TU4 Sep 5 Activity 1</a> <img src=\"/images/layout/placeholder.png\" width=\"222\" height=\"149\"/><br/><br/>First question from Manager on TU4 Sep 5 Activity 1."

Thanks, 谢谢，

Jignesh 吉涅什

Answer 1

The appropriate way to do this is NOT to use regexen. 适当的方法是不要使用regexen。 Instead, use the Nokogiri library to easily parse your html: 而是使用Nokogiri库轻松解析您的html：

require 'nokogiri'

doc = Nokogiri::HTML.parse(str)
activity_id = doc.css('[href^="/activities"]').attr('href').value[/\d+$/]
activity_title = doc.css('[href^="/activities"]')[0].inner_text
message = doc.search("//text()").last

This will do exactly what your regexp was attempting, with much lower chance of random failure. 这将完全执行您的正则表达式所尝试的操作，而发生随机失败的可能性要低得多。

Ruby Regex匹配字符串的多个部分

问题描述

1 个解决方案

解决方案1
1 2012-12-11 10:20:45

Ruby Regex匹配字符串的多个部分

问题描述

1 个解决方案

解决方案1 1 2012-12-11 10:20:45

解决方案1
1 2012-12-11 10:20:45