再次使用PHP正則表達式

Question

之間有什么區別：

preg_replace( '@<(script|style)[^>]*?>.*?</\\1>@si', '', $string );

和

preg_replace( '@<(script|style)[^>]*>.*</\\1>@si', '', $string );

？

Answer 1

是...

考慮這個示例字符串...

<script>bla</script><script>hello</script>

在上面的示例中，它將僅匹配第一個script元素。

它將匹配<script>bla</script><script>hello</script> 。

第一個非貪婪可能不需要存在，因為它將始終搜索所有非> ，然后在其后面也不應有其他任何字符（在non >和close > ）。

我還需要提到使用類似DOMDocument的方法是獲取script和style元素的更好的方法。

$dom = new DOMDocument;

$dom->loadHTML($string);

$scripts = $dom->getElementsByTagName('script');

$styles = $dom->getElementsByTagName('style');

Answer 2

額外的? 會使表達式的貪婪性反轉（默認情況下，它們在php中是貪婪的）：

因此，在您的特定示例中，可以這么說，非貪婪的表達式將捕獲腳本標簽及其內容。 貪婪的版本將開始與第一個腳本標簽匹配，並搶占所有內容（包括非腳本區域）直到最后一個關閉的腳本標簽。

不過，不要依賴：

http://ha.ckers.org/xss.html