PHP DOM獲取網站所有腳本src

Question

我想使用curl和DOM從網站獲取所有腳本src鏈接。

我有以下代碼：

$scripts = $dom->getElementsByTagName('script');

foreach ($scripts as $scripts1) {

    if($scripts1->getAttribute('src')) {

        echo $scripts1->getAttribute('src');

    }

}

該腳本可以正常運行，但是如果網站具有如下腳本標記會發生什么：

<script type="text/javascript">
window._wpemojiSettings = {"source":{"concatemoji":"http:\/\/domain.com\/wp-includes\/js\/wp-emoji-release.min.js?ver=4.2.4"}}; ........
</script>

我還需要獲取此腳本src。 我怎樣才能做到這一點？

Answer 1

如果第一個解析器為空，我將使用正則表達式創建另一個解析器，即：

$html = file_get_contents("http://somesite.com/");

preg_match_all('/<script.*?(http.*?\.js(?:\?.*?)?)"/si', $html, $matches, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($matches[1]); $i++) {
    echo str_replace("\\/", "/", $matches[1][$i]);
}

您可能需要調整正則表達式才能與其他網站一起使用，但是上面的代碼應該使您對所需內容有所了解。

演示： http : //ideone.com/Fwf6Mb

正則表達式說明：

<script.*?(http.*?\.js(?:\?.*?)?)"
----------------------------------

Match the character string “<script” literally «<script»
Match any single character «.*?»
   Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the regex below and capture its match into backreference number 1 «(http.*?\.js(?:\?.*?)?)»
   Match the character string “http” literally «http»
   Match any single character «.*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
   Match the character “.” literally «\.»
   Match the character string “js” literally «js»
   Match the regular expression below «(?:\?.*?)?»
      Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
      Match the character “?” literally «\?»
      Match any single character «.*?»
         Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character “"” literally «"»

正則表達式教程

http://www.regular-expressions.info/tutorial.html

PHP DOM獲取網站所有腳本src

問題描述

1 個解決方案

解決方案1
-1 已采納 2015-09-08 20:36:03

PHP DOM獲取網站所有腳本src

問題描述

1 個解決方案

解決方案1 -1 已采納 2015-09-08 20:36:03

解決方案1
-1 已采納 2015-09-08 20:36:03