简体   繁体   中英

AES - Decryption

I have some C# code that encrypts the body of an email before it sends it to another email account, using AES. I believe the default mode for AES in C# is CBC and I also believe the default padding method in C# is PKCS#7 .

The C# code applies the Default encoding to encode the ciphertext - possibly using the machine's active code page. The server and local machine's active code page is cp437 . Decryption is done using C++ in the production environment and it works, I require a Python version 3.+ equivalent for handling decryption.

I decided write a simple example program to help you understand why simply copy and pasting a string (simply put a byte array) makes you lose data and therefore not be able to decrypt correctly from a given string. By the way, the answer you post in comments perfectly explains why it is a bad thing to store an encrypted data in string.

It is not a good idea to store encrypted data in Strings because they are for human-readable text , not for arbitrary binary data. For binary data it's best to use byte[]

As mentioned by many others in the comment sections as well, giving any encoding there may very well be some characters that is not printable, ie does not have a representation to print on the screen. So if your encrypted text contains some of those non-printable characters, the string interpretation on screen will throw away information. I will also try to explain why in your production server, the code "seems" to work.

Hint: You mention that

...on receiver side works in the same manner by first taking in the encrypted email body as a string (not in byte array form)

There is no such thing as sending string over Email, it is just bytes and you just see an interpretation of it on the receiver end. Anyways let us get back to the question at hand.

First of all, let me abstract away your encryption implementation. One great philosopher once said

We in software development love abstractions.

Here is your AES Encryption with mode CBC, I tried it works:

private byte[] EncryptString(string inputText) {
    // great encryption stuff
    return encryptedBytes;
}

And somewhere in your code, you use it like this:

// you mention in comments that this is your code page
var encoder = Encoding.GetEncoding(437);    
var encrypted = EncryptString(body);
var email = new MailMessage {
       ...
       Body = encoder.GetString(encryptedBytes)
       ...
};

Now let us see how it looks so far! Some screenshots are on their way. 在此处输入图片说明 For a given key and iv, I got the following 26-element-encrypted byte array!

var encyrpted = new byte[] {
                    8, 9, 10, 11, 12, 13, 14, 15, 16,
                    65, 66, 67, 68, 69, 70, 71, 72,
                    73, 74, 75, 0, 1, 2, 3, 4, 5
                 }

Aaaaand how does the body look in the debugger? Looks like some of the characters are already non printable, control characters such as \\b backspace or \\t tab or \\f eject paper / clean video terminal.

在此处输入图片说明

Anyways, how does the string representation look like then? Please take note that I have used CTRL + A to select all the available string info and CTRL + C it into my clipboard.

在此处输入图片说明

Now let us revert the copy-pasted string using the same encoding and see if get the same byte array? Spoiler: lol of course not

在此处输入图片说明

I had 26 bytes before using a copy paste string and now I have only 17 what happened to that 9 bytes? Because they were not printable they were simply not copied when I moved them between text editors.

Since you do not have the total information before and after encryption (therefore as mentioned in the comments thrown away information) you can not expect to decrypt it in Python correctly.

WHY DOES IT WORK IN PRODUCTION SERVER - TO BE EDITED

Before the mail is sent, an explicit decoding is performed in the C# code using the default encoding, which according to the question is Cp437. With Cp437, however, the encoding fails, whereas it is successful with Cp1252 .

Using Cp1252 results in mail.BodyEncoding being implicitly set from ASCIIEncoding (default) to UTF8Encoding and mail.BodyTransferEncoding to TransferEncoding.Base64 .

Cp1252 has (eg in contrast to Cp437) undefined codepoints, namely 0x81, 0x8d, 0x8f, 0x90 and 0x9d. While the defined codepoints of Cp1252 can be converted to UTF8 without any problems (each Cp1252 character also corresponds to a valid Unicode character, eg codepoint U+20AC (€): 0x80 (Cp1252), 0xE282AC (UTF-8)), it is not clear a priori how the undefined codepoints are converted. It turns out that the codepoints are simply UTF8 encoded, ie 0x81, 0x8d, 0x8f, 0x90 and 0x9d are converted to 0xc281, 0xc28d, 0xc28f, 0xc290 and 0xc29d, see eg here . After the UTF8 encoding, the Base64 encoding is performed.

For the decryption in the Python code you simply have to proceed in the opposite direction: First Base64 decoding, then UTF8 decoding and finally Cp1252 encoding (keeping in mind the undefined codepoints). The result is the actual ciphertext.

A possible implementation of the encoding is:

def customEncode(ciphertextB64):
    cipherbytes = base64.b64decode(ciphertextB64)
    ciphertext = cipherbytes.decode('utf8')
    undefCodepoints = [0x81, 0x8d, 0x8f, 0x90, 0x9d]
    result = []
    for char in ciphertext:
        if ord(char) in undefCodepoints:
            data = bytes([ord(char)])           
        else:
            data = char.encode('Cp1252')  
        result.append(data)
    return b''.join(result) 

With this, the posted ciphertext can be encoded and decrypted:

ciphertextB64 = """amDDjAsQJjEzw7nFvSpcIgHFksO/xb3CoF7Cv+KAsMKN4oCZNeKAulxaHwghwo1ExaEQKcOrGMO9
                   Iw4Rw4sncsOLxb3Di8OKwqs0AgdFwo1CB8Oy4oCTPkrDlSbDkTDDtB3Cj2PDocW4AcKxM0bigJnD
                   gsOsw6scw47DlQHCuEEnwqxZwqnDp8KdDBNzw7JKw70aw5/DtcK2FHzigJNJwq1kBsKyw57CpMOi
                   CkPigJQnw5nCgVUcw5bCtl9j4oCcG8OGw5Yiw4zCv1bDrzhBMREIwr1yKMOTT8OMw68OVsOKeGxx
                   wq3Dv0Nkw4vDgcO4wqYCw7DDi8OEFsKjEcOjwrdzw5RUdU/CqwBZw6rDvcKsw67DvE5lwqvDhMKv
                   w5HDiwBy4oC6NsO+w5vigLDDjcOGMHElHA7CjULDnsKtUuKAoH0LUxclPV3FuMO6aWtVAuKAnlcF
                   wr/CsDbCqQEAwr1JMcW+w692w6XCrgbDt8ObZDnDgcOqF8KmwrrCucK9a8Kt4oCdZMOpHiPDigfD
                   hcWSNMOmw7zDhcOPAMOlSzXDs2XDlMKoBcOLdcOMw5PDjeKAusKxw7U94oCgY8Oww6XDrnLigJQj
                   csO7wq8vy5wJMcK4cuKAoMO/MsO/4oCwJcOdXMK0fWxND8uGbiRnBMW4Kw=="""
cipherbytes = customEncode(ciphertextB64)
key = b'\x12\x34\x56\x78\x9A\xBC\xDE\xF0\x12\x34\x56\x78\x9A\xBC\xDE\xF0\x12\x34\x56\x78\x9A\xBC\xDE\xF0\x12\x34\x56\x78\x9A\xBC\xDE\xF0'
iv1 = b'\x02\x13\x24\x35\x46\x57\x68\x79\x8A\x9B\xAC\xBD\xCE\xDF\xE0\xF1'
cipher = AES.new(key, AES.MODE_CBC, iv1)
decrypted = cipher.decrypt(cipherbytes)
decryptedUnpad = unpad(decrypted, AES.block_size)
print(decryptedUnpad) # b'<!DOCTYPE html><html><head><title>Register new RikRhino camera</title></head><body><p>IMEI:324<br/>ServerUrl:https://cmorelm.chpc.ac.za/za<br/>Token:1m7e9LaDp42v6l8hm71l5tZe9z4vO4EFDmiZHiH06e4=<br/>destinationGroup:7<br/>Altitude:4.7<br/>Latitude:-33.7498685982923<br/>Longitude:19.3239212036133</p></body></html>'

The decrypted ciphertext is:

<!DOCTYPE html><html><head><title>Register new RikRhino camera</title></head><body><p>IMEI:324<br/>ServerUrl:https://cmorelm.chpc.ac.za/za<br/>Token:1m7e9LaDp42v6l8hm71l5tZe9z4vO4EFDmiZHiH06e4=<br/>destinationGroup:7<br/>Altitude:4.7<br/>Latitude:-33.7498685982923<br/>Longitude:19.3239212036133</p></body></html>

The encoding is unnecessarily complicated. Furthermore, platform-specific dependencies cannot be excluded (eg the handling of the undefined codepoints), so that the implementation may be platform-dependent and therefore not reliable.

The most reasonable fix is therefore to use a binary-to-text encoding like Base64 in the C# code instead of the charset encoding (in combination with the default values ASCIIEncoding for mail.BodyEncoding and TransferEncoding.SevenBit for mail.BodyTransferEncoding ). Otherwise, this issue will probably continue to cause difficulties in the future.


Update: Concerning your question: Why does the encoding seem to fail with Cp437, whereas it is successful with Cp1252? This is weird especially since, as you mentioned, an explicit decoding is performed in the C# code using the default encoding (which we found is Cp437).
From the C# code (meanwhile only accessible via the history) it is only evident that the default encoding is used. That it is Cp437 cannot be deduced, this was information you provided afterwards (originally you stated that it was ISO-8859-1 or UTF-8 or UTF-16, see the history). Since the posted message can be encoded with Cp1252 but not with Cp437, it is more likely that Cp1252 was the used default and not Cp437.
The default encoding is platform dependent (another reason not to use it for encoding a ciphertext) and sometimes not very transparent. Eg on Windows systems there are two code pages ( ANSI and OEM ), which can be quite different (eg ANSI: often Cp1252 in Western Europe, OEM: often Cp850 in Western Europe, Cp437 in USA). According to the documentation, the Encoding.Default property returns the ANSI code page. Possibly there is simply a mix-up here.
I'm not claiming that this is the correct explanation, but it's a possible one. I also don't want to exclude completely the possibility that an encoding of the posted ciphertext with a charset other than Cp1252 is feasible. However, there are convincing reasons (besides the successful encoding and decryption) for Cp1252:

  • The binary data after Base64 decoding corresponds to the allowed and very characteristic UTF-8 sequences without any exception, so that practically with certainty a UTF-8 encoding can be assumed.
  • The characters after UTF-8 decoding in turn correspond to the characters of the Cp1252 charset but not to those of the Cp437 charset (or another one), so that only an encoding with Cp1252 is achievable, but probably with no other charset.

Since the assumption that Cp1252 was used consists of an analysis of the characters decoded with UTF-8, the probability for its correctness depends on the length/number of the analyzed ciphertexts. The posted ciphertext already has a statistically relevant length, but a verification with further ciphertexts to check that assumption is nonetheless advisable.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM