简体   繁体   中英

UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 65534-65535: unexpected end of data

I want to encrypt file with simple AES encryption,here is my python3 source code.

import os, random, struct
from Crypto.Cipher import AES

def encrypt_file(key, in_filename, out_filename=None, chunksize=64*1024):
    if not out_filename:
        out_filename = in_filename + '.enc'
    iv = os.urandom(16)
    encryptor = AES.new(key, AES.MODE_CBC, iv)
    filesize = os.path.getsize(in_filename)
    with open(in_filename, 'rb') as infile:
        with open(out_filename, 'wb') as outfile:
            outfile.write(struct.pack('<Q', filesize))
            outfile.write(iv)
            while True:
                chunk = infile.read(chunksize)
                if len(chunk) == 0:
                    break
                elif len(chunk) % 16 != 0:
                    chunk += ' ' * (16 - len(chunk) % 16)
                outfile.write(encryptor.encrypt(chunk.decode('UTF-8','strict')))

It works fine for some files,encounter error info for some files such as below:

encrypt_file("qwertyqwertyqwer",'/tmp/test1' , out_filename=None, chunksize=64*1024)

No error info,works fine.

encrypt_file("qwertyqwertyqwer",'/tmp/test2' , out_filename=None, chunksize=64*1024)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 17, in encrypt_file
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 65534-65535: unexpected end of data

How to fix my encrypt_file function?

Do as tmadam say ,to fix

outfile.write(encryptor.encrypt(chunk.decode('UTF-8','strict')))

as

outfile.write(encryptor.encrypt(chunk))

To try with some file.

encrypt_file("qwertyqwertyqwer",'/tmp/test' , out_filename=None, chunksize=64*1024)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 16, in encrypt_file
TypeError: can't concat bytes to str

The main issue with your code is that you're using strings. AES works with binary data, and if you were using PyCryptodome this code would raise a TypeError:

Object type <class 'str'> cannot be passed to C code

Pycrypto accepts strings, but encodes them to bytes internally, so it doesn't make sense to decode your bytes to string because it will be encoded back to bytes. Also, it encodes with ASCII (tested with PyCrypto v2.6.1, Python v2.7) and so, this code for example:

encryptor.encrypt(u'ψ' * 16)

would raise a UnicodeEncodeError:

File "C:\Python27\lib\site-packages\Crypto\Cipher\blockalgo.py", line 244, in encrypt
    return self._cipher.encrypt(plaintext)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-15

You should always use bytes when encrypting or decrypting data. Then you can decode the plaintext to string, if it is text.

The next issue is your padding method. It produces a string and so you're getting a TypeError when you try to apply it to the plaintext, which should be bytes. You can fix this if you pad with bytes,

chunk +=  * (16 - len(chunk) % 16)

but it would be best to use the PKCS7 padding (currently you're using zero padding, but with a space instead of a zero byte).

PyCryptodome provides padding functions, but it seems you're using PyCrypto. In this case, you could implement PKCS7 padding, or better yet copy PyCryptodome's padding functions.

try:
    from Crypto.Util.Padding import pad, unpad
except ImportError:
    from Crypto.Util.py3compat import bchr, bord

    def pad(data_to_pad, block_size):
        padding_len = block_size-len(data_to_pad)%block_size
        padding = bchr(padding_len)*padding_len
        return data_to_pad + padding

    def unpad(padded_data, block_size):
        pdata_len = len(padded_data)
        if pdata_len % block_size:
            raise ValueError("Input data is not padded")
        padding_len = bord(padded_data[-1])
        if padding_len<1 or padding_len>min(block_size, pdata_len):
            raise ValueError("Padding is incorrect.")
        if padded_data[-padding_len:]!=bchr(padding_len)*padding_len:
            raise ValueError("PKCS#7 padding is incorrect.")
        return padded_data[:-padding_len]

The pad and unpad functions were copied from Crypto.Util.Padding and modified to use only PKCS7 padding. Note that when using PKCS7 padding it is important to pad the last chunk, even if its size is a multiple of the block size, otherwise you won't be able to unpad correctly.

Applying those changes to the encrypt_file function,

def encrypt_file(key, in_filename, out_filename=None, chunksize=64*1024):
    if not out_filename:
        out_filename = in_filename + '.enc'
    iv = os.urandom(16)
    encryptor = AES.new(key, AES.MODE_CBC, iv)
    filesize = os.path.getsize(in_filename)
    with open(in_filename, 'rb') as infile:
        with open(out_filename, 'wb') as outfile:
            outfile.write(struct.pack('<Q', filesize))
            outfile.write(iv)
            pos = 0
            while pos < filesize:
                chunk = infile.read(chunksize)
                pos += len(chunk)
                if pos == filesize:
                    chunk = pad(chunk, AES.block_size)
                outfile.write(encryptor.encrypt(chunk))

and the matching decrypt_file function,

def decrypt_file(key, in_filename, out_filename=None, chunksize=64*1024):
    if not out_filename:
        out_filename = in_filename + '.dec'
    with open(in_filename, 'rb') as infile:
        filesize = struct.unpack('<Q', infile.read(8))[0]
        iv = infile.read(16)
        encryptor = AES.new(key, AES.MODE_CBC, iv)
        with open(out_filename, 'wb') as outfile:
            encrypted_filesize = os.path.getsize(in_filename)
            pos = 8 + 16 # the filesize and IV.
            while pos < encrypted_filesize:
                chunk = infile.read(chunksize)
                pos += len(chunk)
                chunk = encryptor.decrypt(chunk)
                if pos == encrypted_filesize:
                    chunk = unpad(chunk, AES.block_size)
                outfile.write(chunk)

This code is Python2/Python3 compatible, and it should work either with PyCryptodome or with PyCrypto.

However, if you're using PyCrypto, I recommend updating to PyCryptodome. PyCryptodome is a fork of PyCrypto and it exposes the same API (so you won't have to change your code too much), plus some extra features: padding functions, Authenticated Encryption algorithms, KDFs etc. On the other hand, PyCrypto is not being maintained anymore and also, some versions suffer from a heap-based buffer overflow vulnerability: CVE-2013-7459 .

In addition to the accepted answer, I believe showing multiple implementations of simple AES encryption can be useful for readers/new learners:

import os
import sys
import pickle
import base64
import hashlib
import errno

from Crypto import Random
from Crypto.Cipher import AES

DEFAULT_STORAGE_DIR = os.path.join(os.path.dirname(__file__), '.ncrypt')

def create_dir(dir_name):
    """ Safely create a new directory. """
    try:
        os.makedirs(dir_name)
        return dir_name
    except OSError as e:
        if e.errno != errno.EEXIST:
            raise OSError('Unable to create directory.')


class AESCipher(object):
    DEFAULT_CIPHER_PICKLE_FNAME = "cipher.pkl"

    def __init__(self, key):
        self.bs = 32  # block size
        self.key = hashlib.sha256(key.encode()).digest()

    def encrypt(self, raw):
        raw = self._pad(raw)
        iv = Random.new().read(AES. block_size)
        cipher = AES.new(self.key, AES.MODE_CBC, iv)
        return base64.b64encode(iv + cipher.encrypt(raw))

    def decrypt(self, enc):
        enc = base64.b64decode(enc)
        iv = enc[:AES.block_size]
        cipher = AES.new(self.key, AES.MODE_CBC, iv)
        return self._unpad(cipher.decrypt(enc[AES.block_size:])).decode('utf-8')

    def _pad(self, s):
        return s + (self.bs - len(s) % self.bs) * chr(self.bs - len(s) % self.bs)

    @staticmethod
    def _unpad(s):
        return s[:-ord(s[len(s)-1:])]

And illustrating examples of the above's usage:

while True:
    option = input('\n'.join(["="*80,
                              "| Select an operation:",
                              "| 1) E : Encrypt",
                              "| 2) D : Decrypt",
                              "| 3) H : Help",
                              "| 4) G : Generate new cipher",
                              "| 5) Q : Quit",
                              "="*80,
                              "> "])).lower()
    print()

    if option == 'e' or option == 1:
        plaintext = input('Enter plaintext to encrypt: ')
        print("Encrypted: {}".format(cipher.encrypt(plaintext).decode("utf-8")))

    elif option == 'd' or option == 2:
        ciphertext = input('Enter ciphertext to decrypt: ')
        print("Decrypted: {}".format(cipher.decrypt(ciphertext.encode("utf-8"))))

    elif option == 'h' or option == 3:
        print("Help:\n\tE: Encrypt plaintext\n\tD: Decrypt ciphertext.")

    elif option == 'g' or option == 4:
        if input("Are you sure? [yes/no]: ").lower() in ["yes", "y"]:
            cipher = AESCipher(key=input('Enter cipher password: '))

            with open(pickle_fname, 'wb') as f:
                pickle.dump(cipher, f)
            print("Generated new cipher.")

    elif option == 'q' or option == 5:
        raise EOFError
    else:
        print("Unknown operation.")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM