[英]URL extraction from String in Javascript
I'm getting raw HTML data back from a service, and need to extract a URL from the string. 我从服务中获取原始HTML数据,并且需要从字符串中提取URL。 Specifically there is a section of the HTML where the URL string exists, and it is a parameter called 'data-url'.
具体来说,HTML的一部分中存在URL字符串,它是一个称为“ data-url”的参数。 Is there a way I can extract just the URL immediately following 'data-url'.
有没有办法我可以仅在“ data-url”之后立即提取URL。 Here's an example:
这是一个例子:
let html_str = '<div class="tv-focusable" id="tv_web_answer_source" tabindex="-1" data-url="https://apple.stackexchange.com/questions/323174/does-the-iphone-8-have-any-sort-of-water-resistance-or-waterproof-manufacturing" onclick="onUrlClick(this)">'
I just need to strip out the domain and store it. 我只需要删除域并将其存储。
You can create a URL
object from a string using new URL(text)
and get the hostname
of that Object. 您可以使用
new URL(text)
从字符串创建URL
对象,并获取该对象的hostname
。 Only thing that remains is choosing how you will extract the url from the html. 剩下的唯一事情就是选择如何从html提取URL。
Using regex 使用正则表达式
var html = '<div class="tv-focusable" id="tv_web_answer_source" tabindex="-1" data-url="https://apple.stackexchange.com/questions/323174/does-the-iphone-8-have-any-sort-of-water-resistance-or-waterproof-manufacturing" onclick="onUrlClick(this)">'; console.log(new URL(html.match(/data-url="([^"]*)"/)[1]).hostname);
Using html 使用HTML
var html = '<div class="tv-focusable" id="tv_web_answer_source" tabindex="-1" data-url="https://apple.stackexchange.com/questions/323174/does-the-iphone-8-have-any-sort-of-water-resistance-or-waterproof-manufacturing" onclick="onUrlClick(this)">'; var element = document.createElement("div"); element.innerHTML = html; var elementWithData = element.querySelector("[data-url]"); if (elementWithData) { console.log(new URL(elementWithData.getAttribute("data-url")).hostname); }
I would personally go with the html solution, since if (for unknown reasons) the url contains this text \\"
, then the regex will fail (though you could just add that constraint). 我个人会使用html解决方案,因为如果(出于未知原因)如果url包含此文本
\\"
,则正则表达式将失败(尽管您可以添加该约束)。
Also, if you want ES5 compatibility you should use getAttribute
over dataset
. 另外,如果要与ES5兼容,则应在
dataset
使用getAttribute
。 But this will only matter when using older versions of IE (up to 11) 但这仅在使用旧版本的IE(最多11个)时才重要
Just use getAttribute 只需使用getAttribute
document.getElementById('tv_web_answer_source').getAttribute('data-url')
Even better, use the dataset
(because the attribute you want start with data-
) 更好的是,使用
dataset
(因为您要以data-
开头的属性)
document.getElementById('tv_web_answer_source').dataset.url
https://developer.mozilla.org/fr/docs/Web/API/HTMLElement/dataset https://developer.mozilla.org/fr/docs/Web/API/HTMLElement/dataset
Easiest thing would be to use the DOM to get the information. 最简单的方法是使用DOM获取信息。 Set your string of html to a new element, select it, and use dataset to get the value of the attribute.
将您的html字符串设置为一个新元素,将其选中,然后使用数据集获取该属性的值。
var div = document.createElement("div") div.innerHTML = `<div class="tv-focusable" id="tv_web_answer_source" tabindex="-1" data-url="https://apple.stackexchange.com/questions/323174/does-the-iphone-8-have-any-sort-of-water-resistance-or-waterproof-manufacturing" onclick="onUrlClick(this)"></div>` var str = div.querySelector('[data-url]').dataset.url var host = new URL(str).hostname console.log(host, str)
也许用
url = s.split("data-url=|\" ")[1];
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.