screen scraping

Question

i am screen scraping a website which is in danish language.. i am unable to scrape certain characters as like må .. any idea to solve this? thanks

Answer 1

尝试使用UTF-8或Windows-1252字符集。

Answer 2

Its better to use the same encoding that the HttpWebResponse object has, Below is the code that will work with all langauges and characters .

        response = (HttpWebResponse)request.GetResponse();
        string Charset = response.CharacterSet;

        Encoding encoding = Encoding.GetEncoding(Charset);

        if (response.StatusCode == HttpStatusCode.OK)
        {
            response_stream = new StreamReader(response.GetResponseStream(), encoding);

            html = response_stream.ReadToEnd();
        }

Answer 3

If you are using a Web browser control, you can set the page encoding to whatever language that can show that character. Then just extract the page source.

Answer 4

我只是用System.Web.HttpContext.Current.Server.HtmlDecode（）它工作..

Answer 5

I use iso-8859-1 for decoding. HTH

screen scraping

Question

5 answers

solution1
1 2010-05-28 11:56:04

solution2
0 2012-10-13 13:55:37

solution3
0 2010-05-29 01:35:52

solution4
0 ACCPTED 2010-06-01 13:26:45

solution5
0 2011-08-03 20:50:55

screen scraping

Question

5 answers

solution1 1 2010-05-28 11:56:04

solution2 0 2012-10-13 13:55:37

solution3 0 2010-05-29 01:35:52

solution4 0 ACCPTED 2010-06-01 13:26:45

solution5 0 2011-08-03 20:50:55

solution1
1 2010-05-28 11:56:04

solution2
0 2012-10-13 13:55:37

solution3
0 2010-05-29 01:35:52

solution4
0 ACCPTED 2010-06-01 13:26:45

solution5
0 2011-08-03 20:50:55