简体   繁体   中英

Decoding multiple encoded string

How do I decode this to get the result below?

/browse_ajax?action_continuation=1\u0026amp;continuation=4qmFsgJAEhhVQ2ZXdHFQeUJNR183aTMzT2VlTnNaWncaJEVnWjJhV1JsYjNNZ0FEZ0JZQUZxQUhvQk03Z0JBQSUzRCUzRA%253D%253D

/browse_ajax?action_continuation=1&continuation=4qmFsgJAEhhVQ2ZXdHFQeUJNR183aTMzT2VlTnNaWncaJEVnWjJhV1JsYjNNZ0FEZ0JZQUZxQUhvQk03Z0JBQSUzRCUzRA%253D%253D

I've tried these, also using them multiple times as I did read strings may be encoded multiple times.

System.Text.RegularExpressions.Regex.Unescape(string)
System.Uri.UnescapeDataString(string)
System.Net.WebUtility.UrlDecode(string)

Which is the right function here or rather in what order do I need to call them to get that result. As the strings vary there may be other special characters in the set so doing a workaround, editing it myself, is somewhat too risky.

The string has to be decoded to work with new System.Net.WebClient().DownloadString(string) .

EDIT: So I found out the above statement is wrong, I do not have to decode this to use WebClient.DownloadString(string) . However the downloaded string suffers similar encoding too. Setting the WebClient 's Encoding property to UTF8 inbefore downloading does most of the job, however some characters still seem corrupted, for example: Double quotes and ampersand stay \" and \& .

I don't know how to make \& to &, so I can change & amp; to &.

That these strings are double (actually triple) encoded in this way is a sign that the string is not being encoded correctly. If you own the code that encodes these strings, consider solving this problem there, which is the root of the issue.

That said, here are the decoding calls you need to make to decode this. I do not recommend this solution, as it is definitely a workaround. Again, the problematic behavior is in the code doing the encoding.

string val = "/browse_ajax?action_continuation=1\u0026amp;continuation=4qmFsgJAEhhVQ2ZXdHFQeUJNR183aTMzT2VlTnNaWncaJEVnWjJhV1JsYjNNZ0FEZ0JZQUZxQUhvQk03Z0JBQSUzRCUzRA%253D%253D";
val = System.Uri.UnescapeDataString(val);
val = System.Uri.UnescapeDataString(val);
val = System.Web.HttpUtility.HtmlDecode(val);

This will give you:

/browse_ajax?action_continuation=1&continuation=4qmFsgJAEhhVQ2ZXdHFQeUJNR183aTMzT2VlTnNaWncaJEVnWjJhV1JsYjNNZ0FEZ0JZQUZxQUhvQk03Z0JBQSUzRCUzRA==

If you really want to keep the %253D encoding of the equal signs, just call Uri.UnescapeData(string) once. This will leave the equal signs encoded, except as %3D , which is their proper encoded value.

Looked like the mysterium was solved to me, however I stumbled upon it again, didn't find any build in solution as these seem to fail decoding utf8 if the character is part of an html-escaped character.

As these however only seem to use the ampersand, I had to use Replace(@"\&","&") to be able to HtmlDecode and get a proper string.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM