简体   繁体   English

.NET WebClient.DownloadData获取文件类型?

[英].NET WebClient.DownloadData get file type?

In order to handle cases of downloading data from a url that has no file extension, I need to know what the file type is. 为了处理从没有文件扩展名的URL下载数据的情况,我需要知道文件类型是什么。

for example, how can the WebClient.DownloadData method reveal that it downloaded a png [edit: jpeg] image using the url below? 例如,WebClient.DownloadData方法如何使用下面的url显示它下载了一个png [edit:jpeg]图像?

https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcTw4P3HxyHR8wumE3lY3TOlGworijj2U2DawhY9wnmcPKnbmGHg https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcTw4P3HxyHR8wumE3lY3TOlGworijj2U2DawhY9wnmcPKnbmGHg

I did not find anything in the documentation that describes how to do this. 我没有在文档中找到任何描述如何执行此操作的内容。

If you trust the header information, this is possible to do using WebClient —you don't need to use HttpClient : 如果您信任标头信息,则可以使用WebClient - 您不需要使用HttpClient

var webClient = new WebClient();
var result = webClient.DownloadData(url);
var contentType = webClient.ResponseHeaders["Content-Type"];

if (contentType != null && 
    contentType.StartsWith("image", StringComparison.OrdinalIgnoreCase))
{
    // it's probably an image
}

It can't, directly. 它不能,直接。

If you trust the headers the web server sends back, you could use a different HTTP client (eg WebRequest or HttpClient ) to make the entire response available rather than just the body. 如果您信任Web服务器发回的标头,您可以使用不同的HTTP客户端(例如WebRequestHttpClient )来使整个响应可用而不仅仅是正文。 You can then look at the Content-Type header. 然后,您可以查看Content-Type标头。

Other than that, you'll need to look at the content itself. 除此之外,您还需要查看内容本身。 Various file types have "magic numbers" which you could use to identify the file - they're typically at the start of the file, and if you only have a limited set of file types to look for, this may well be a viable approach. 各种文件类型都有“神奇的数字”,您可以用它们来识别文件 - 它们通常位于文件的开头,如果您只有一组有限的文件类型需要查找,这可能是一种可行的方法。 It won't be able to identify all file types though. 但它无法识别所有文件类型。

As an example, the first four bytes of the image you've linked to are ff d8 ff e0. 例如,您链接到的图像的前四个字节是ff d8 ff e0。 That reveals that actually it's not a jpeg image. 这表明实际上它不是jpeg图像。 As it happens, the server response also included a header of content-type: image/jpeg . 碰巧,服务器响应包括content-type: image/jpeg的标题content-type: image/jpeg

You may use HttpClient for doing this GET request. 您可以使用HttpClient来执行此GET请求。

Sample code: 示例代码:

            HttpClient client = new HttpClient();
            var response = await client.GetAsync("https://encrypted-tbn2.gstatic.com/images?q=tbn%3aANd9GcTw4P3HxyHR8wumE3lY3TOlGworijj2U2DawhY9wnmcPKnbmGHg");
            var filetype = response.Content.Headers.ContentType.MediaType;
            var imageArray = await response.Content.ReadAsByteArrayAsync();

On the above code, filetype variable has the file type and also extension as image/JPEG or image/PNG etc. 在上面的代码中,filetype变量具有文件类型,并且扩展名为image / JPEG或image / PNG等。

You can try using FindMimeFromData API. 您可以尝试使用FindMimeFromData API。 Here is the snippet. 这是片段。 It may help you. 它可能会帮助你。

WebClient webClient = new WebClient();
var result = webClient.DownloadData(new Uri("url"));
IntPtr mimeout;
int result2 = FindMimeFromData(IntPtr.Zero, "sample", result, 4096, null, 0, out mimeout, 0);
if (result2 != 0)
    throw Marshal.GetExceptionForHR(result2);
string mime = Marshal.PtrToStringUni(mimeout);
Marshal.FreeCoTaskMem(mimeout);
Console.WriteLine(mime);

And here is the API declaration. 这是API声明。 (Copied from here ) (从这里复制)

[DllImport("urlmon.dll", CharSet = CharSet.Unicode, ExactSpelling = true, SetLastError = false)]
static extern int FindMimeFromData(IntPtr pBC, [MarshalAs(UnmanagedType.LPWStr)] string pwzUrl, [MarshalAs(UnmanagedType.LPArray, ArraySubType = UnmanagedType.I1, SizeParamIndex = 3)] 
byte[] pBuffer, int cbSize, [MarshalAs(UnmanagedType.LPWStr)]  string pwzMimeProposed, int dwMimeFlags, out IntPtr ppwzMimeOut, int dwReserved);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM