[英]Not sure where my Regex went wrong
I'm writing a Javascript bookmarklet as a side project for work (don't code for a living, very much a beginner). 我正在写一个Javascript小书签作为工作的副项目(不要为生计而编写代码,而非常适合初学者)。
It scans through a cnn.com
transcript and picks out the names and titles of the live guests, excluding those that are played from tape. 它扫描
cnn.com
脚本,并挑选现场嘉宾的姓名和头衔,但不包括从磁带上播放的嘉宾。
To do this I grab the site, then use replace()
and regex to remove text between BEGIN VIDEO CLIP
and END VIDEO CLIP
, and then use another regular expression to scan for everything that matches the NAME, TITLE:
format. 为此,我抓住了该站点,然后使用
replace()
和regex删除BEGIN VIDEO CLIP
和END VIDEO CLIP
之间的文本,然后使用另一个正则表达式来扫描与NAME, TITLE:
格式匹配的所有内容。 It works like a charm on some transcripts, and fails miserably on others. 它在某些笔录上就像一个咒语,而在另一些笔录上却惨败。 Here's my code:
这是我的代码:
(function () {
var webPage = document.body.innerText;
var tape = webPage.replace(/(BEGIN VIDEO CLIP)([\s\S]*)(END VIDEO CLIP)|(BEGIN VIDEOTAPE)([\s\S]*)(END VIDEOTAPE)/g, "");
var searchForGuests = /[A-Z ].+,[A-Z0-9 ].+:/g;
var guests = tape.match(searchForGuests).join("; ");
alert("Guests: " + guests)
})();
As an example, when applied to http://transcripts.cnn.com/TRANSCRIPTS/1303/05/pmt.01.html , it alerts only the name of the host (Piers Morgan), even though there are several live guests. 例如,当应用于http://transcripts.cnn.com/TRANSCRIPTS/1303/05/pmt.01.html时,即使有多个实时来宾,它也仅警告主机名(Piers Morgan)。 Is it my regex that's the problem?
这是我的正则表达式吗? I've been testing in Regexr, but as far as I can tell, not using anything illegal in Javascript.
我已经在Regexr中进行了测试,但据我所知,没有在Javascript中使用任何非法的东西。
It should work on any of the following transcripts: http://transcripts.cnn.com/transcripts . 它应适用于以下任何脚本: http : //transcripts.cnn.com/transcripts 。
The major problem here is probably the greedy [\\s\\S]*
, which will match and remove too much. 这里的主要问题可能是贪婪的
[\\s\\S]*
,它会匹配并删除太多。 Try to use [\\s\\S]*?
尝试使用
[\\s\\S]*?
instead. 代替。 The added
?
增加了
?
after the *
makes it match as little as possible (instead of as much as possible). *
使其尽可能少地匹配(而不是尽可能多地匹配)。
In your searchForGuests
regex, try ^([A-Za-z0-9, ]+(?=:))
在您的
searchForGuests
正则表达式中,尝试^([A-Za-z0-9, ]+(?=:))
If your text is this: 如果您的文字是这样的:
TOM COUGHLIN, NFL COACH: Preparation is the key to success.
MORGAN: Plus he's worn out his Oscar welcome but she's Hollywood's golden girl, Kristin Chenoweth.
It'll return match: 它将返回匹配项:
TOM COUGHLIN, NFL COACH
MORGAN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.