简体   繁体   English

相对于 HTML 中的绝对路径

[英]Relative to absolute paths in HTML

I need to create a newsletters by URL.我需要通过 URL 创建时事通讯。 To do that, I:为此,我:

  1. Create a WebClient .创建一个WebClient
  2. Use WebClient's method DownloadData to get a source of page in byte array;使用 WebClient 的方法DownloadData获取字节数组中的页面源;
  3. Get string from the source-html byte array and set it to the newsletter content.从 source-html 字节数组中获取字符串并将其设置为时事通讯内容。

However, I have some troubles with paths.但是,我在路径方面遇到了一些麻烦。 All elements' sources were relative ( /img/welcome.png ) but I need an absolute one, like http://www.example.com/img/welcome.png .所有元素的来源都是相对的( /img/welcome.png ),但我需要一个绝对的来源,例如http://www.example.com/img/welcome.png

How can I do this?我怎样才能做到这一点?

One of the possible ways to resolve this task is the use the HtmlAgilityPack library.解决此任务的一种可能方法是使用HtmlAgilityPack库。

Some example ( fix links ):一些示例(修复链接):

WebClient client = new WebClient();
byte[] requestHTML = client.DownloadData(sourceUrl);
string sourceHTML = new UTF8Encoding().GetString(requestHTML);

HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(sourceHTML);

foreach (HtmlNode link in htmlDoc.DocumentNode.SelectNodes("//a[@href]"))
{
    if (!string.IsNullOrEmpty(link.Attributes["href"].Value))
    {
        HtmlAttribute att = link.Attributes["href"];
        att.Value = this.AbsoluteUrlByRelative(att.Value);
    }
}

if the request comes in from your site (same domain links) then you can use this:如果请求来自您的站点(相同域链接),那么您可以使用:

new Uri(Request.Uri, "/img/welcome.png").ToString();

If you're in a non-web app, or you want to hardcode the domain name:如果您使用的是非 Web 应用程序,或者您想对域名进行硬编码:

new Uri("http://www.mysite.com", "/img/welcome.png").ToString();

Instead of resolving/completing relative paths, you can try to set the base-element with the href-attrib = the original baseURI in question.您可以尝试使用 href-attrib = 有问题的原始 baseURI 来设置 base-element,而不是解析/完成相对路径。

Placed as the first child of the header-element, all following relative paths should be resolved by browser to point to the original destination, not to where the doc (newsletter) is located/comes from.作为标题元素的第一个子元素放置,浏览器应解析所有后续相对路径以指向原始目的地,而不是指向文档(时事通讯)所在/来自的位置。

on firefox, some tautologic(<-in formal logics) to-and-fro of getting/setting of all src/href-attribs resumes in having COMPLETE paths written to all layers(serialized) of the html-doc, thus scriptable, saveable ...:在 firefox 上,所有 src/href-attribs 的获取/设置的一些重复逻辑(<-形式逻辑)来回恢复,将完整路径写入 html-doc 的所有层(序列化),因此可编写脚本,可保存...:

var d=document;
var n= d.querySelectorAll('[src]'); // do the same for [href] ...
var i=0; var op ="";var ops="";
for (i=0;i<n.length;i++){op = op + n[i].src + "\n";ops=n[i].src;
n[i].src=ops;}
alert(op);

Of course, the url()-func bases as given in the STYLE-Element(s, - for background-img or content-rules) as well as in style-attrib's at node-level and in particular the url()-func-stated src/href-values are NOT regarded/tested by any of the solutions above.当然, url()-func 基础在 STYLE-Element(s, - for background-img 或 content-rules) 以及节点级别的 style-attrib 中给出,特别是 url()-func -stated src/href-values 不被上述任何解决方案考虑/测试。

Therefore, to get the base-Elem approach to a valid, tested (compat-list) state, seems the more promising notion to me.因此,让 base-Elem 方法达到有效的、经过测试的(compat-list)状态,对我来说似乎更有希望。

You have some options:你有一些选择:

  1. You can convert your byte array to a string and find replace.您可以将字节数组转换为字符串并找到替换。
  2. You can create a DOM object, convert the byte array to string, load it and append the value to the attributes where needed (basically you are looking for any src, href attribute that doesn't have http: or https: in it).您可以创建一个 DOM 对象,将字节数组转换为字符串,加载它并将值附加到需要的属性(基本上,您正在寻找任何没有 http: 或 https: 的 src、href 属性)。
Console.Write(ControlChars.Cr + "Please enter a Url(for example, http://www.msn.com): ")
    Dim remoteUrl As String = Console.ReadLine()
    Dim myWebClient As New WebClient()
    Console.WriteLine(("Downloading " + remoteUrl))
    Dim myDatabuffer As Byte() = myWebClient.DownloadData(remoteUrl)
    Dim download As String = Encoding.ASCII.GetString(myDataBuffer)
    download.Replace("src=""/", "src=""" & remoteUrl & "/")
    download.Replace("href=""/", "href=""" & remoteUrl & "/")
    Console.WriteLine(download)
    Console.WriteLine("Download successful.")

This is super contrived and actually the main brunt of it is taken directly from : http://msdn.microsoft.com/en-us/library/xz398a3f.aspx but it illustrates the basic principal behind method 1.这是超级人为的,实际上主要的冲击直接来自: http : //msdn.microsoft.com/en-us/library/xz398a3f.aspx但它说明了方法 1 背后的基本原理。

Just use this function只需使用此功能

'# converts relative URL ro Absolute URI
    Function RelativeToAbsoluteUrl(ByVal baseURI As Uri, ByVal RelativeUrl As String) As Uri
        ' get action tags, relative or absolute
        Dim uriReturn As Uri = New Uri(RelativeUrl, UriKind.RelativeOrAbsolute)
        ' Make it absolute if it's relative
        If Not uriReturn.IsAbsoluteUri Then
            Dim baseUrl As Uri = baseURI
            uriReturn = New Uri(baseUrl, uriReturn)
        End If
        Return uriReturn
    End Function

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM