简体   繁体   English

DownloadString和特殊字符

[英]DownloadString and Special Characters

I am trying to find the index of Mauricio in a string that is downloaded from a website using webclient and download string. 我试图在使用webclient和下载字符串从网站下载的字符串中找到Mauricio的索引。 However, on the website it contains a foreign character, Maurício. 但是,在网站上它包含一个外国人物Maurício。 So I found elsewhere some code 所以我在别处找到了一些代码

string ToASCII(string s)
{
return String.Join("",
     s.Normalize(NormalizationForm.FormD)
    .Where(c => char.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark));
}

that converts foreign characters. 转换外国字符。 I have tested the code and it works. 我已经测试了代码并且它有效。 So the problem I have is that when I download the string, it downloads as MaurA-cio. 所以我遇到的问题是,当我下载字符串时,它会下载为MaurA-cio。 I have tried both 我试过了两个

wc.Encoding = System.Text.Encoding.UTF8; wc.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7");

Neither stop it from downloading as MaurA-cio. 也没有阻止它作为MaurA-cio下载。

(Also, I cannot change the search as I am getting the search term from a list). (另外,我无法更改搜索,因为我从列表中获取搜索词)。

What else can I try? 我还能尝试什么? Thanks 谢谢

var client = new WebClient { Encoding = System.Text.Encoding.UTF8 };

var json = client.DownloadString(url);

this one will work for any character 这个适用于任何角色

DownloadString doesn't look at HTTP response headers. DownloadString不查看HTTP响应头。 It uses the previously set WebClient.Encoding property. 它使用以前设置的WebClient.Encoding属性。 If you have to use it, get the headers first: 如果必须使用它,请先获取标题:

// call twice 
// (or to just do a HEAD, see http://stackoverflow.com/questions/3268926/head-with-webclient)
webClient.DownloadString("http://en.wikipedia.org/wiki/Maurício");
var contentType = webClient.ResponseHeaders["Content-Type"];
var charset = Regex.Match(contentType,"charset=([^;]+)").Groups[1].Value;

webClient.Encoding = Encoding.GetEncoding(charset);
var s = webClient.DownloadString("http://en.wikipedia.org/wiki/Maurício");

BTW--Unicode doesn't define "foreign" characters. BTW - Unicode不定义“外来”字符。 From Maurício's perspective, "Mauricio" would be the foreign spelling of his name. 从毛里西奥的角度来看,“毛里西奥”将成为他名字的外国拼写。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM