简体   繁体   English

从 html 获取 href 属性会产生不需要的结果

[英]Getting the href attribute from html gives unwanted results

I am basically scraping some content off of a website and the HTML looks something like this:我基本上是从网站上抓取一些内容,HTML 看起来像这样:

<div>
    <a class="title" href="/recipe/pasta">Pasta Recipe</a>
</div>

Now after scraping this off of the website I use js to get the href attribute like this:现在在从网站上刮下这个之后,我使用 js 来获取这样的 href 属性:

html.getElementsByTagName('a')[0].href

Now the problem is that this returns: file:///A:/recipe/pasta but the result I want is /recipe/pasta .现在的问题是它返回: file:///A:/recipe/pasta但我想要的结果是/recipe/pasta Here's a Stack Snippet example of the same problem - the href results in the domain being prepended, which is undesirable:这是同一问题的 Stack Snippet 示例 - href导致域被前置,这是不可取的:

 console.log(document.getElementsByTagName('a')[0].href);
 <div> <a class="title" href="/recipe/pasta">Pasta Recipe</a> </div>

I can fix this problem with basic string manipulation but that seems rudimentary.我可以通过基本的字符串操作来解决这个问题,但这似乎很初级。

Also file:///A: is the drive on my computer the A: drive.还有file:///A:是我电脑上的驱动器A:驱动器。 If I run this on another computer then it will become file:///C: , representing the C: drive.如果我在另一台计算机上运行它,那么它将变成file:///C: ,代表C:驱动器。

It might also help to know that I am doing this on an electron app using nodeJS.知道我正在使用 nodeJS 在电子应用程序上执行此操作也可能有所帮助。

Use getAttribute instead, to get just the plain value of the attribute and nothing else:改用getAttribute来获取属性的普通值,而不是其他任何内容:

 const href = document.querySelector('a').getAttribute('href'); console.log(href);
 <div> <a class="title" href="/recipe/pasta">Pasta Recipe</a> </div>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM