正則表達式，用於在字符串中查找URL

Question

我有一個弦

<h1>hello/h1>
<script src="http://www.test.com/file1.js"></script>
<script src="http://www.test.com/file2.js"></script>
<p>bye</p>

並且我需要使用在字符串中找到的網址生成一個數組。

['http://www.test.com/file1.js', 'http://www.test.com/file2.js']

我還需要將整行（包括標簽腳本標簽）全部替換為空。

到目前為止，這是我找到的網址

^(<script src=")(.*)("><\/script>)$

問題在於它僅適用於

<script src="http://www.test.com/file1.js"></script>

如果我這樣定義我的腳本

<script id="something" src="http://www.test.com/file1.js"></script>

它不起作用。

Answer 1

考慮改用合適的HTML解析器，例如cheerio：查找<script>標記，將其刪除，並將其src推入數組：

const cheerio = require('cheerio');

const htmlStr = `<h1>hello/h1>
<script src="http://www.test.com/file1.js"></script>
<script src="http://www.test.com/file2.js"></script>
<p>bye</p>`;
const $ = cheerio.load(htmlStr);

const urls = [];
$('script').each((_, script) => {
  urls.push(script.src);
  $(script).remove();
});
const result = $('body').html();
console.log(result);

Answer 2

要獲取網址，您可以執行以下操作：

^<script.*?src="(.*)".*?><\/script>$

這捕獲了屬性在src屬性之前和之后的情況。

Answer 3

此RegEx可能會幫助您獲取這些URL：

^<.+="(.+)"><\/.+>$

它創建一個組，您的目標URL在那里，並過濾所有其他內容。 它也可以與<a>標簽和其他具有打開和關閉模式的相似標簽一起使用。

Answer 4

使用這個insted

^(<script )(.*)(src=")(.*)("><\/script>)$

第四組是網址

或^(?:<script )(?:.*)(?:src=")(.*)(?:"><\\/script>)$以使用非捕獲組。

正則表達式，用於在字符串中查找URL

問題描述

4 個解決方案

解決方案1
3 2019-04-18 22:22:05

解決方案2
-1 2019-04-18 22:20:51

解決方案3
-1 2019-04-18 22:23:43

解決方案4
-1 2019-04-19 12:25:52

正則表達式，用於在字符串中查找URL

問題描述

4 個解決方案

解決方案1 3 2019-04-18 22:22:05

解決方案2 -1 2019-04-18 22:20:51

解決方案3 -1 2019-04-18 22:23:43

解決方案4 -1 2019-04-19 12:25:52

解決方案1
3 2019-04-18 22:22:05

解決方案2
-1 2019-04-18 22:20:51

解決方案3
-1 2019-04-18 22:23:43

解決方案4
-1 2019-04-19 12:25:52