简体   繁体   中英

HttpResponseMessage.Content.Header ignoring charset setting in meta tag in html source

I have just posted this question, which answer came right away. It, in turn, creates the following new question:

If my understanding is correct, the StreamContent object, from HttpResponseMessage , is created upon making an HTTP request via HttpClient.GetAsync . Its Header property, or part of it, will be set according to meta tags included in the HTML source file.

For instance, a meta tag can tell the response object with which charset encode the file's contents.

<meta http-equiv='Content-Type' content='text/html; charset=utf-8' />

Running a request to a resource that contains such line will generate a HttpResponseMessage.Content.Header with this setting.

In the other question referenced at the top of this question, I mention about a response object being created without the correct encoding. Since the HTML source that generates such incompatible response does contain the setting that is responsible for creating responses properly encoded:

<meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1255">

what is the reason that responses for that site are not being passed the charset setting included in the meta tag and thus being rendered in an incorrect charset?

Here's a pictorial description of the question: both sites contain the meta tag with charset setting, but one, for some reason, misses it...

在此处输入图片说明


Fiddler's header details for both requests:

Working one: (removed cookie header)

Request:

GET http://www.ynet.co.il/home/0,7340,L-8,00.html HTTP/1.1
Host: www.ynet.co.il
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
If-Modified-Since: Thu, 31 Mar 2016 10:04:39 GMT

Response:

HTTP/1.1 200 OK
vg_id: 1
X-me: 06
Content-Type: text/html; charset=UTF-8
Last-Modified: Thu, 31 Mar 2016 10:38:57 GMT
Accept-Ranges: bytes
VX-Cache: HIT
WAI: 01
V-TTL: 0
backend-cache-control: 
Content-Length: 410685
Vary: Accept-Encoding
Date: Thu, 31 Mar 2016 10:38:48 GMT
Connection: keep-alive

Problematic one:

Request:

GET http://winedepot.co.il/ HTTP/1.1
Host: winedepot.co.il
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Cookie: __utma=201832727.725995063.1458660502.1459413977.1459418530.8; __utmz=201832727.1458660502.1.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none); __utmc=201832727; ASPSESSIONIDCQTRQCAQ=FEOHEBFCBGABBKOBAHOGKBGB
Connection: keep-alive

Response:

HTTP/1.1 200 OK
Cache-Control: private
Content-Length: 118225
Content-Type: text/html
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET
Date: Thu, 31 Mar 2016 10:36:21 GMT

As you can see from your Fiddler screenshots, the HttpResponseMessage.Content.Headers.ContentType will contain exactly what was specified in the Content-type header of the response.

The HttpResponseMessage will not parse the response HTML and search for any <meta /> tags.

content type comes from the HTTP HEADER

https://en.wikipedia.org/wiki/List_of_HTTP_header_fields

<meta http-equiv='Content-Type' content='text/html; charset=utf-8' />

is part of the content and not part of the headers.

I suggest you to install the application Fiddler to better understand what those request actually do. set fiddler as your proxy and use the inspectors to see what is actually passed when you make http requests.

better explanation is far from the scope here

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM