简体   繁体   中英

Javascript regex to extract title and iframe

A Google apps script gets HTTP response content text. The excerpt as is below.

<p style="text-align: left;"><span style="background-color: rgb(242, 195, 20);"><span style="color: rgb(192, 80, 77);">Disclaimer:</span></span><span style="background-color: rgb(255, 255, 255);">Please note,</span><a href="http://www.g00gl3.com"><span style="background-color: rgb(255, 255, 255);">http://www.g00gl3.com</span></a><span style="background-color: rgb(255, 255, 255);"> or </span><a href="http://www.g00gl3.com"><span style="background-color: rgb(255, 255, 255);">www.G00gl3.com</span></a><span style="background-color: rgb(255, 255, 255);"> is only video embedding websites. All of the videos found here come from 3rd party video hosting sites. We do not host any of the videos. Please contact to appropriate video hosting site for any video removal.</span></p>
<div style="text-align: center;"><strong><span style="background-color: rgb(255, 255, 255);">Dailymotion  <br><br></span></strong></div>
<div style="text-align: center;"><iframe src="http://www.dailymotion.com/embed/video/foo1234567890bar? syndication=202279" width="640" height="360" frameborder="0"></iframe></div>
<div style="text-align: center;"><strong><span style="background-color: rgb(255, 255, 255);">Alternate Video  <br><br></span></strong></div>
<div style="text-align: center;"><iframe src="http://hqq.tv/player/embed_player.php?vid=1234567890&amp;autoplay=no" width="720" height="450" frameborder="0"></iframe></div>

From this excerpt there is a need to extract title (Dailymotion or Alternate Video) and iframe.

Matching only iframe is already done.

/<iframe(.*)\/iframe>/g

Now the expected is

Dailymotion  <br><br></span></strong></div>
<div style="text-align: center;"><iframe src="http://www.dailymotion.com/embed/video/foo1234567890bar? syndication=202279" width="640" height="360" frameborder="0"></iframe>

and

Alternate Video  <br><br></span></strong></div>
<div style="text-align: center;"><iframe src="http://hqq.tv/player/embed_player.php?vid=1234567890&amp;autoplay=no" width="720" height="450" frameborder="0"></iframe>

Can anybody help to write regex to fetch only above.

试试这个,应该工作:

/255\);">([a-zA-Z]+\s+.*)<br><br>/g

Assuming you need to search for only those two titles, this will extract all the information you need:

[\s\S]*(Dailymotion|Alternate Video)[\s\S]*(<iframe[\s\S]*<\/iframe>)

Here's a page where you can see it working:

The first answer work but i thinks it's not very restrictive. this regex [\\s\\S]*(Dailymotion|Alternate Video)[\\s\\S]*(<iframe[\\s\\S]*<\\/iframe>) work for your exemples but if the HTML code is wrong the regex match (You can test it).

I have make 2 regex more stronger, the inconvenient is that the regex is so long. The first part of my regex is to match this line :

<div style="text-align: center;"><strong><span style="background-color: rgb(255, 255, 255);">Dailymotion <br><br></span></strong></div>

Regex :

^(\\<((\\D+)( [az]*=\\"[\\S]*|[ ]\\.{0,1}[\\S]*\\")*)\\>).*(Dailymotion|Alternate Video).*\\<\\/\\3\\>|(\\<\\D+\\/\\>)$

https://regex101.com/r/XthACq/1

The capture group verify if the HTML is "valid". For exemple you can't have closed by . When the first line of your html match, you can use the second regex to verify the .

<div style="text-align: center;"><iframe src="http://www.dailymotion.com/embed/video/foo1234567890bar? syndication=202279" width="640" height="360" frameborder="0"></iframe></div>

Is match by this regex :

^(\\<((\\D+)( [az]*=\\"[\\S]*|[ ]\\.{0,1}[\\S]*\\")*)\\>).*<(iframe)( [az]*=\\"[\\S]*|[ ]\\.{0,1}[\\S]*\\")+\\><\\/\\5>\\<\\/\\3\\>|(\\<\\D+\\/\\>)$

https://regex101.com/r/wBBOi5/1

Like the first regex, the HTML code is verify. Now you can extract the title, the link, all attribute by using capture group.

@l-vadim 答案是最接近的,我正在使用它。

/255\);">([a-zA-Z]+\s+.*)<br><br>/g

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM