不知道我的正则表达式在哪里出错

Question

I'm writing a Javascript bookmarklet as a side project for work (don't code for a living, very much a beginner). 我正在写一个Javascript小书签作为工作的副项目（不要为生计而编写代码，而非常适合初学者）。

It scans through a cnn.com transcript and picks out the names and titles of the live guests, excluding those that are played from tape. 它扫描cnn.com脚本，并挑选现场嘉宾的姓名和头衔，但不包括从磁带上播放的嘉宾。

To do this I grab the site, then use replace() and regex to remove text between BEGIN VIDEO CLIP and END VIDEO CLIP , and then use another regular expression to scan for everything that matches the NAME, TITLE: format. 为此，我抓住了该站点，然后使用replace()和regex删除BEGIN VIDEO CLIP和END VIDEO CLIP之间的文本，然后使用另一个正则表达式来扫描与NAME, TITLE:格式匹配的所有内容。 It works like a charm on some transcripts, and fails miserably on others. 它在某些笔录上就像一个咒语，而在另一些笔录上却惨败。 Here's my code: 这是我的代码：

(function () {
    var webPage = document.body.innerText;
    var tape = webPage.replace(/(BEGIN VIDEO CLIP)([\s\S]*)(END VIDEO CLIP)|(BEGIN VIDEOTAPE)([\s\S]*)(END VIDEOTAPE)/g, "");
    var searchForGuests = /[A-Z ].+,[A-Z0-9 ].+:/g;
    var guests = tape.match(searchForGuests).join("; ");
    alert("Guests: " + guests)
})();

As an example, when applied to http://transcripts.cnn.com/TRANSCRIPTS/1303/05/pmt.01.html , it alerts only the name of the host (Piers Morgan), even though there are several live guests. 例如，当应用于http://transcripts.cnn.com/TRANSCRIPTS/1303/05/pmt.01.html时，即使有多个实时来宾，它也仅警告主机名（Piers Morgan）。 Is it my regex that's the problem? 这是我的正则表达式吗？ I've been testing in Regexr, but as far as I can tell, not using anything illegal in Javascript. 我已经在Regexr中进行了测试，但据我所知，没有在Javascript中使用任何非法的东西。

It should work on any of the following transcripts: http://transcripts.cnn.com/transcripts . 它应适用于以下任何脚本： http : //transcripts.cnn.com/transcripts 。

Answer 1

The major problem here is probably the greedy [\\s\\S]* , which will match and remove too much. 这里的主要问题可能是贪婪的[\\s\\S]* ，它会匹配并删除太多。 Try to use [\\s\\S]*? 尝试使用[\\s\\S]*? instead. 代替。 The added ? 增加了? after the * makes it match as little as possible (instead of as much as possible). *使其尽可能少地匹配（而不是尽可能多地匹配）。

Answer 2

In your searchForGuests regex, try ^([A-Za-z0-9, ]+(?=:)) 在您的searchForGuests正则表达式中，尝试^([A-Za-z0-9, ]+(?=:))

If your text is this: 如果您的文字是这样的：

TOM COUGHLIN, NFL COACH: Preparation is the key to success. 
MORGAN: Plus he's worn out his Oscar welcome but she's Hollywood's golden girl, Kristin Chenoweth.

It'll return match: 它将返回匹配项：

TOM COUGHLIN, NFL COACH
MORGAN

不知道我的正则表达式在哪里出错

问题描述

2 个解决方案

解决方案1
0 已采纳 2013-03-08 20:45:09

解决方案2
0 2013-03-08 20:48:37

不知道我的正则表达式在哪里出错

问题描述

2 个解决方案

解决方案1 0 已采纳 2013-03-08 20:45:09

解决方案2 0 2013-03-08 20:48:37

解决方案1
0 已采纳 2013-03-08 20:45:09

解决方案2
0 2013-03-08 20:48:37