简体   繁体   English

WebClient.DownloadFile 404错误以及URI中的HTML字符?

[英]WebClient.DownloadFile 404 errors with HTML characters in URI?

I'm using the WebClient class to download files from a web site and have a couple of questions. 我正在使用WebClient类从网站下载文件,并且有几个问题。

  1. When the URIs have HTML characters in the URI path (eg http://foo.com/path1 & path2.pdf) I get 404 (not found) errors. 当URI在URI路径中具有HTML字符时(例如http://foo.com/path1&amp & path2.pdf),我会收到404(未找到)错误。 How can I prevent this? 我该如何预防? I thought HTML characters were safe? 我以为HTML字符是安全的?

  2. When the URIs represent a directory (eg http://foo.com/path ) I get 403 (forbidden) errors. 当URI代表目录时(例如http://foo.com/path ),我得到403(禁止)错误。 I understand why this is occuring but how can I test my URI to see if it represents a directory with no index page. 我知道为什么会发生这种情况,但是如何测试URI以查看它是否表示没有索引页的目录。

  1. HTML encoded characters are not safe for URLs. HTML编码字符对于URL是不安全的。 You need to URL encode them. 您需要对其进行URL编码。 If your data is stored html encoded, you'll want to use HttpUtility.HtmlDecode to get to a properly formatted URL (ie foo.com/page?foo=1&bar=2 . If you have special characters that must go in URLs, like ampersands that are not part of the query portion of the URL, you'll want to URL encode them. Use HttpUtility.UrlEncode 如果您的数据是以html编码存储的,则需要使用HttpUtility.HtmlDecode来获取格式正确的URL(即foo.com/page?foo=1&bar=2 。如果您必须在URL中使用特殊字符,例如不属于URL查询部分的“&”号,您需要对其进行URL编码。请使用HttpUtility.UrlEncode
  2. You can't. 你不能

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM