简体   繁体   English

用于提取标题和 iframe 的 Javascript 正则表达式

[英]Javascript regex to extract title and iframe

A Google apps script gets HTTP response content text. Google 应用程序脚本获取 HTTP 响应内容文本。 The excerpt as is below.摘录如下。

<p style="text-align: left;"><span style="background-color: rgb(242, 195, 20);"><span style="color: rgb(192, 80, 77);">Disclaimer:</span></span><span style="background-color: rgb(255, 255, 255);">Please note,</span><a href="http://www.g00gl3.com"><span style="background-color: rgb(255, 255, 255);">http://www.g00gl3.com</span></a><span style="background-color: rgb(255, 255, 255);"> or </span><a href="http://www.g00gl3.com"><span style="background-color: rgb(255, 255, 255);">www.G00gl3.com</span></a><span style="background-color: rgb(255, 255, 255);"> is only video embedding websites. All of the videos found here come from 3rd party video hosting sites. We do not host any of the videos. Please contact to appropriate video hosting site for any video removal.</span></p>
<div style="text-align: center;"><strong><span style="background-color: rgb(255, 255, 255);">Dailymotion  <br><br></span></strong></div>
<div style="text-align: center;"><iframe src="http://www.dailymotion.com/embed/video/foo1234567890bar? syndication=202279" width="640" height="360" frameborder="0"></iframe></div>
<div style="text-align: center;"><strong><span style="background-color: rgb(255, 255, 255);">Alternate Video  <br><br></span></strong></div>
<div style="text-align: center;"><iframe src="http://hqq.tv/player/embed_player.php?vid=1234567890&amp;autoplay=no" width="720" height="450" frameborder="0"></iframe></div>

From this excerpt there is a need to extract title (Dailymotion or Alternate Video) and iframe.从这段摘录中,需要提取标题(Dailymotion 或 Alternate Video)和 iframe。

Matching only iframe is already done.仅匹配 iframe 已经完成。

/<iframe(.*)\/iframe>/g

Now the expected is现在预期是

Dailymotion  <br><br></span></strong></div>
<div style="text-align: center;"><iframe src="http://www.dailymotion.com/embed/video/foo1234567890bar? syndication=202279" width="640" height="360" frameborder="0"></iframe>

and

Alternate Video  <br><br></span></strong></div>
<div style="text-align: center;"><iframe src="http://hqq.tv/player/embed_player.php?vid=1234567890&amp;autoplay=no" width="720" height="450" frameborder="0"></iframe>

Can anybody help to write regex to fetch only above.任何人都可以帮助编写正则表达式以仅在上面获取。

试试这个,应该工作:

/255\);">([a-zA-Z]+\s+.*)<br><br>/g

Assuming you need to search for only those two titles, this will extract all the information you need:假设您只需要搜索这两个标题,这将提取您需要的所有信息:

[\s\S]*(Dailymotion|Alternate Video)[\s\S]*(<iframe[\s\S]*<\/iframe>)

Here's a page where you can see it working:这是一个您可以看到它工作的页面:

The first answer work but i thinks it's not very restrictive.第一个答案有效,但我认为它不是很严格。 this regex [\\s\\S]*(Dailymotion|Alternate Video)[\\s\\S]*(<iframe[\\s\\S]*<\\/iframe>) work for your exemples but if the HTML code is wrong the regex match (You can test it).此正则表达式[\\s\\S]*(Dailymotion|Alternate Video)[\\s\\S]*(<iframe[\\s\\S]*<\\/iframe>)适用于您的示例,但如果 HTML 代码错误,则正则表达式匹配(您可以测试它)。

I have make 2 regex more stronger, the inconvenient is that the regex is so long.我让 2 个正则表达式更强大,不方便的是正则表达式太长了。 The first part of my regex is to match this line :我的正则表达式的第一部分是匹配这一行:

<div style="text-align: center;"><strong><span style="background-color: rgb(255, 255, 255);">Dailymotion <br><br></span></strong></div>

Regex :正则表达式:

^(\\<((\\D+)( [az]*=\\"[\\S]*|[ ]\\.{0,1}[\\S]*\\")*)\\>).*(Dailymotion|Alternate Video).*\\<\\/\\3\\>|(\\<\\D+\\/\\>)$

https://regex101.com/r/XthACq/1 https://regex101.com/r/XthACq/1

The capture group verify if the HTML is "valid".捕获组验证 HTML 是否“有效”。 For exemple you can't have closed by .例如,您不能关闭 . When the first line of your html match, you can use the second regex to verify the .当您的 html 的第一行匹配时,您可以使用第二个正则表达式来验证 .

<div style="text-align: center;"><iframe src="http://www.dailymotion.com/embed/video/foo1234567890bar? syndication=202279" width="640" height="360" frameborder="0"></iframe></div>

Is match by this regex :与此正则表达式匹配:

^(\\<((\\D+)( [az]*=\\"[\\S]*|[ ]\\.{0,1}[\\S]*\\")*)\\>).*<(iframe)( [az]*=\\"[\\S]*|[ ]\\.{0,1}[\\S]*\\")+\\><\\/\\5>\\<\\/\\3\\>|(\\<\\D+\\/\\>)$

https://regex101.com/r/wBBOi5/1 https://regex101.com/r/wBBOi5/1

Like the first regex, the HTML code is verify.与第一个正则表达式一样,HTML 代码是验证。 Now you can extract the title, the link, all attribute by using capture group.现在您可以使用捕获组提取标题、链接和所有属性。

@l-vadim 答案是最接近的,我正在使用它。

/255\);">([a-zA-Z]+\s+.*)<br><br>/g

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM