i am screen scraping a website which is in danish language.. i am unable to scrape certain characters as like må .. any idea to solve this? thanks
尝试使用UTF-8或Windows-1252字符集。
Its better to use the same encoding that the HttpWebResponse object has, Below is the code that will work with all langauges and characters .
response = (HttpWebResponse)request.GetResponse();
string Charset = response.CharacterSet;
Encoding encoding = Encoding.GetEncoding(Charset);
if (response.StatusCode == HttpStatusCode.OK)
{
response_stream = new StreamReader(response.GetResponseStream(), encoding);
html = response_stream.ReadToEnd();
}
If you are using a Web browser control, you can set the page encoding to whatever language that can show that character. Then just extract the page source.
我只是用System.Web.HttpContext.Current.Server.HtmlDecode()它工作..
I use iso-8859-1 for decoding. HTH
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.