简体   繁体   English

如何将网页unicode转换为ascii?

[英]How can I convert webpage unicode to ascii?

I am attempting to convert a webpage from a format I don't understand to ascii so I can look for certain data. 我正在尝试将网页从我不理解的格式转换为ascii,以便可以查找某些数据。 I retrieve the data using webclient with a url of the web page and then using encoding to convert the data from what I think is unicode to ascii but the format doesn't change at all. 我使用带有网页URL的webclient检索数据,然后使用编码将数据从我认为是unicode的内容转换为ascii,但格式完全不变。 Below is my code: 下面是我的代码:

WebClient web = new WebClient();
string page = "https://www.myurl.com/";

Stream data = web.OpenRead(page);
StreamReader reader1 = new StreamReader(data);
string input = reader1.ReadToEnd();
Encoding unicode = Encoding.Unicode;
Encoding ascii = Encoding.ASCII;

string webpage = ascii.GetString(
  Encoding.Convert(unicode, ascii, unicode.GetBytes(input))
);

Below is what the webpage data looks like which is the same as the input data which suggests my conversion didn't work. 以下是网页数据的外观,它与输入数据相同,这表明我的转换无效。

     \"sprited\":true,\"spriteCssClass\":\"sx_a11c08\",\"spriteMapCssClass\":\"sp_SN-oNOqlzVS\"},\"505789\":{\"sprited\":true,\"spriteCssClass\":\"sx_5219b1\",\"spriteMapCssClass\":\"sp_SN-oNOqlzVS\"},\"505782\":{\"sprited\":true,\"spriteCssClass\":\"sx_c0671f\",\"spriteMapCssClass\":\"sp_SN-oNOqlzVS\"},\"505794\":{\"sprited\":true,\"spriteCssClass\":\"sx_8cf344\",\"spriteMapCssClass\":\"sp_SN-oNOqlzVS\"},\"495429\": 

Does anyone know what kind of data this is and how to convert it into data I can understand? 有谁知道这是什么类型的数据,以及如何将其转换为我可以理解的数据? When I show the page source of the webpage on the browser none of this weird data shows up. 当我在浏览器上显示网页的页面源时,这些奇怪的数据都不会显示出来。 In other words the data I get from the webclient doesn't look at all like the page source on the browser. 换句话说,我从Web客户端获得的数据看起来根本不像浏览器上的页面源。

Is that the full web page data below? 这是下面的完整网页数据吗? It looks incomplete on both ends.To me, it looks like JSON data to me. 两端看起来都不完整,对我而言,它看起来像JSON数据。 You can convert it into a C# object by using the JavaScriptSerializer class. 您可以使用JavaScriptSerializer类将其转换为C#对象。

JavaScriptSerializer json_serializer = new JavaScriptSerializer();
Test resultingData = (Test)json_serializer.DeserializeObject(your_data);

If you want to read JSON from a request, do it like here , 如果您想从请求中读取JSON,请按此处进行操作

var json = web.DownloadString(page);

Then you need to deserialize the string into an object, if you know the type of the model in response, you can do it like this, lets day its ResponseType . 然后,您需要将字符串反序列化为对象,如果您知道响应的模型类型,则可以这样做,让它的ResponseType day。

using Newtonsoft.Json;

...

var result = JsonConvert.DeserializeObject<ResponseType>(json);

There is a NuGet package called Facebook which you can import to your project. 有一个名为Facebook的NuGet包,您可以将其导入到项目中。 This will give you some models that might match up with the type. 这将为您提供一些可能与类型匹配的模型。


If you don't know the type of the response you could do something like this, 如果您不知道回复的类型,则可以执行以下操作,

using Newtonsoft.Json.Linq;

...

var jObject = JObject.Parse(json);

Then you can use LINQ to query the object. 然后,您可以使用LINQ查询对象。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM