ruby 正则表达式使用最后一个匹配来分隔字符串，但应该首先使用

Question

Im parsing the source of a website and Im using this regex:我正在解析网站的源代码并使用此正则表达式：

/page\.php\?id\=([0-9]*)\"\>(.*)\<\/a\>\<\/span\>/.match(self.agent.page.content)

self.agent.page.content contains the source of the page fetched by mechanize. self.agent.page.content包含由 mechanize 获取的页面的来源。 The regex basicly works but in the secound match it does fetch more then it should because there are more then one <\/a\>\<\/span\> in the source and the regex uses the last one so I get a bunch of html crap.正则表达式基本上可以工作，但在第二场比赛中它确实获取了比它应该更多的东西，因为源代码中有多个<\/a\>\<\/span\>并且正则表达式使用最后一个所以我得到了一堆html 废话。 How can I tell the regex to use the first match as an "end marker"?我怎样才能告诉正则表达式使用第一个匹配项作为“结束标记”？

Answer 1

.* is greedy, whereas.*? .* 是贪婪的，而.*? is non-greedy.是非贪婪的。 Try:尝试：

/page\.php\?id\=([0-9]*)\"\>(.*?)\<\/a\>\<\/span\>/.match(self.agent.page.content)

ruby 正则表达式使用最后一个匹配来分隔字符串，但应该首先使用

问题描述

1 个解决方案

解决方案1
4 已采纳 2012-04-05 17:57:01

ruby 正则表达式使用最后一个匹配来分隔字符串，但应该首先使用

问题描述

1 个解决方案

解决方案1 4 已采纳 2012-04-05 17:57:01

解决方案1
4 已采纳 2012-04-05 17:57:01