简体   繁体   中英

Converting email subject from “?UTF-8?…” to string?

I'm using these techniques to convert =?utf-8?B?...?= to a readable string:

How convert email subject from “?UTF-8?…?=” to readable string?

string encode / decode

It works for simple input, but I have some input that have nested =?utf-8?B?...?= , for example:

"=?utf-8?B?2KfbjNmGINuM2qkg2YXYqtmGINiz2KfYr9mHINin2LPYqg==?= =?utf-8?B?2KfbjNmGINuM2qkg2YXYqtmGINiz2KfYr9mHINin2LPYqg==?= =?utf-8?B?2YbYr9is?="

I know the part between =?UTF-8?B? and ?= is a base64 encoded string, But in this case I don't have any idea how to extract them.

You can use a regex to extract the string between =?UTF-8?B? and ?= then convert the rest. Here's an example:

string input = "=?utf-8?B?2KfbjNmGINuM2qkg2YXYqtmGINiz2KfYr9mHINin2LPYqg==?= =?utf-8?B?2KfbjNmGINuM2qkg2YXYqtmGINiz2KfYr9mHINin2LPYqg==?= =?utf-8?B?2YbYr9is?=";
Regex regex = new Regex(string.Format("{0}(.*?){1}",Regex.Escape("=?utf-8?B?"), Regex.Escape("?=")));
var matches = regex.Matches(input);
foreach (Match match in matches)
{

    Console.WriteLine(
                Encoding.UTF8.GetString(Convert.FromBase64String(match.Groups[1].Value))
                );
}

This will print:

این یک متن ساده است
این یک متن ساده است
ندج

Don't forget to include these using statements:

using System.Text.RegularExpressions;
using System.Text;

Working example available here .

Try with something like:

string str = "=?utf-8?B?2KfbjNmGINuM2qkg2YXYqtmGINiz2KfYr9mHINin2LPYqg==?= =?utf-8?B?2KfbjNmGINuM2qkg2YXYqtmGINiz2KfYr9mHINin2LPYqg==?= =?utf-8?B?2YbYr9is?=";

const string utf8b = "=?utf-8?B?";

var parts = str.Split(new[] { "?=" }, 0);

foreach (var part in parts)
{
    string str2 = part.Trim();

    if (str2.StartsWith(utf8b, StringComparison.OrdinalIgnoreCase))
    {
        str2 = str2.Substring(utf8b.Length);
        byte[] bytes = Convert.FromBase64String(str2);
        string final = Encoding.UTF8.GetString(bytes);
        Console.WriteLine(final);
    }
    else if (str2 == string.Empty)
    {
        // Nothing to do here
    }
    else
    {
        Console.WriteLine("Not recognized {0}", str2);
    }
}

Note that technically the rfc 1342 is a little more complex... instead of utf-8 you could have any encoding, and instead of B you could have Q (for Quoted Printable)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM