简体   繁体   English

从JavaScript中的字符串提取URL

[英]URL extraction from String in Javascript

I'm getting raw HTML data back from a service, and need to extract a URL from the string. 我从服务中获取原始HTML数据,并且需要从字符串中提取URL。 Specifically there is a section of the HTML where the URL string exists, and it is a parameter called 'data-url'. 具体来说,HTML的一部分中存在URL字符串,它是一个称为“ data-url”的参数。 Is there a way I can extract just the URL immediately following 'data-url'. 有没有办法我可以仅在“ data-url”之后立即提取URL。 Here's an example: 这是一个例子:

let html_str = '<div class="tv-focusable" id="tv_web_answer_source" tabindex="-1" data-url="https://apple.stackexchange.com/questions/323174/does-the-iphone-8-have-any-sort-of-water-resistance-or-waterproof-manufacturing" onclick="onUrlClick(this)">'

I just need to strip out the domain and store it. 我只需要删除域并将其存储。

You can create a URL object from a string using new URL(text) and get the hostname of that Object. 您可以使用new URL(text)从字符串创建URL对象,并获取该对象的hostname Only thing that remains is choosing how you will extract the url from the html. 剩下的唯一事情就是选择如何从html提取URL。

Using regex 使用正则表达式

 var html = '<div class="tv-focusable" id="tv_web_answer_source" tabindex="-1" data-url="https://apple.stackexchange.com/questions/323174/does-the-iphone-8-have-any-sort-of-water-resistance-or-waterproof-manufacturing" onclick="onUrlClick(this)">'; console.log(new URL(html.match(/data-url="([^"]*)"/)[1]).hostname); 

Using html 使用HTML

 var html = '<div class="tv-focusable" id="tv_web_answer_source" tabindex="-1" data-url="https://apple.stackexchange.com/questions/323174/does-the-iphone-8-have-any-sort-of-water-resistance-or-waterproof-manufacturing" onclick="onUrlClick(this)">'; var element = document.createElement("div"); element.innerHTML = html; var elementWithData = element.querySelector("[data-url]"); if (elementWithData) { console.log(new URL(elementWithData.getAttribute("data-url")).hostname); } 

I would personally go with the html solution, since if (for unknown reasons) the url contains this text \\" , then the regex will fail (though you could just add that constraint). 我个人会使用html解决方案,因为如果(出于未知原因)如果url包含此文本\\" ,则正则表达式将失败(尽管您可以添加该约束)。

Also, if you want ES5 compatibility you should use getAttribute over dataset . 另外,如果要与ES5兼容,则应在dataset使用getAttribute But this will only matter when using older versions of IE (up to 11) 但这仅在使用旧版本的IE(最多11个)时才重要

Just use getAttribute 只需使用getAttribute

document.getElementById('tv_web_answer_source').getAttribute('data-url')

Even better, use the dataset (because the attribute you want start with data- ) 更好的是,使用dataset (因为您要以data-开头的属性)

document.getElementById('tv_web_answer_source').dataset.url

https://developer.mozilla.org/fr/docs/Web/API/HTMLElement/dataset https://developer.mozilla.org/fr/docs/Web/API/HTMLElement/dataset

Easiest thing would be to use the DOM to get the information. 最简单的方法是使用DOM获取信息。 Set your string of html to a new element, select it, and use dataset to get the value of the attribute. 将您的html字符串设置为一个新元素,将其选中,然后使用数据集获取该属性的值。

 var div = document.createElement("div") div.innerHTML = `<div class="tv-focusable" id="tv_web_answer_source" tabindex="-1" data-url="https://apple.stackexchange.com/questions/323174/does-the-iphone-8-have-any-sort-of-water-resistance-or-waterproof-manufacturing" onclick="onUrlClick(this)"></div>` var str = div.querySelector('[data-url]').dataset.url var host = new URL(str).hostname console.log(host, str) 

也许用

url = s.split("data-url=|\" ")[1];

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM