I am currently using PHP's curl request to fetch content from a URL. After getting the contents I need to inspect the given HTML chunk, find a 'video' that has a given style attribute and extract their source src values text. Currently I get the page but how I can get this value? Here is my code to get the page:
<?php
$Url = 'some site';
if (!function_exists('curl_init')){
die('CURL is not installed!');
}
$ch = curl_init($Url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // add this one, it seems to spawn redirect 301 header
curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13'); // spoof
$output = curl_exec($ch);
curl_close($ch);
echo $output;
The code above is working and output the page. Then in the page's output I inspect elements and I found this:
<div class="webstarvideo">
<video style="width:100%;height:100%" preload="none" class="">
<source src="I NEED THIS" type="video/mp4"></video>
<div class="webstarvideodoul">
<canvas></canvas>
</div>
</div>
I need the src of the video in the above code, how can I do that?
At PHP level :
You can use a regex with preg_match or use the PHP DOMDocument class :
DOM
$doc = new DOMDocument();
$doc->loadHTML($output);
$videoSource = $doc->getElementsByTagName('source');
echo $videoSource->getAttribute('src');
With REGEX
$array = array();
preg_match("/source src=\"([^\"]*)\" type=\"video\/mp4\">/i", $output, $array);
echo $array[1];
If you want to get the video's SRC as a PHP variable, you need to extract it from the string, by checking where "type" is:
$output = '<div class="webstarvideo">
<video style="width:100%;height:100%" preload="none" class="">
<source src="I NEED THIS" type="video/mp4"></video>
<div class="webstarvideodoul">
<canvas></canvas>
</div>
</div>';
$type_position = strpos($output, "type=");
$video_src = substr($output, 110, $type_position - 112);
echo $video_src; // I NEED THIS
110
in the above example is the number of characters up to and including the left double-quote in the SRC attribute, and 112
is an additional two characters to compensate for the right double-quote and the space before type
.
Hope this helps! :)
With PHP, you can use Simple HTML DOM Parser to do this, query syntax like jQuery.
$Url = 'some site';
if (!function_exists('curl_init')){
die('CURL is not installed!');
}
$ch = curl_init($Url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // add this one, it seems to spawn redirect 301 header
curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13'); // spoof
$output = curl_exec($ch);
curl_close($ch);
$html = str_get_html($output);
$video = $html->find('video', 0);
$videoSrc = $video->src;
var_dump($videoSrc);
Assuming that $output
is the complete text, you can regex is using...
preg_match_all("/(?<=\<source).*?src=\"([^\"]+)\"/", $output, $all);
print_r($all[1]); // all the links will be in this array
Use document.querySelector()
to point your element.Then get the src
attribute using document.getAttribute()
.
var video = document.querySelector('.webstarvideo video source'); console.log(video.getAttribute('src'));
<div class="webstarvideo"> <video style="width:100%;height:100%" preload="none" class=""> <source src="I NEED THIS" type="video/mp4"></video> <div class="webstarvideodoul"> <canvas></canvas> </div> </div>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.