简体   繁体   中英

HTTP1.1 Protocol MultiLanguage

I'm working in a crawler and I want to know if the page accept multiple languages. My request is as follows:

GET www.stackoverflow.com HTTP/1.1
Host: www.stackoverflow.com
Accept-Language: en

How I know in the response if they accept more than one language? In the header? Content language specifies just one?

(this is an example header, not the stackoverflow answer)

HTTP/1.1·200·OK
Date:·Sat,·06·Set·2014·15:52:50·GMT
Server:·Apache/2
Content-Location:·qa-http-and-lang.en.php
Vary:·negotiate,accept-language,Accept-Encoding
TCN:·choice
P3P:·policyref="http://www.w3.org/2001/05/P3P/p3p.xml"
Connection:·close
Transfer-Encoding:·chunked
Content-Type:·text/html; charset=utf-8
Content-Language:·en

In first place you don´t have to set the Accept-Language attribute. You only have to parse the HTTP response and get Content-Language. It should have the values for all the languages that the content is intended. If no Content-Language is specified, the default is that the content is intended for all language audiences. This might mean that the sender does not consider it to be specific to any natural language, or that the sender does not know for which language it is intended.

So, if Content-Language is specified and have more than 1 value then the page accept multiple languages and if no Content-Language is specified you should decide if you consider that it accepts multiple languages or not.

Reference: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.12

Hope it helps

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM