简体   繁体   English

隐写术-UnicodeDecode错误

[英]python - Steganography - UnicodeDecode Error

I am writing a Python script to hide data in an image. 我正在编写一个Python脚本来隐藏图像中的数据。 It basically hides the bits in the last two bits of the red color in RGB map of each pixel in a .PNG. 它基本上隐藏了.PNG中每个像素的RGB映射中红色的最后两位中的位。 The script works fine for lowercase letters but produces an error with a full stop. 该脚本适用于小写字母,但会出现句号错误。 It produces this error: 它产生此错误:

Traceback (most recent call last): File "E:\\Python\\Steganography\\main.py", line 65, in print(unhide('coded-img.png')) File "E:\\Python\\Steganography\\main.py", line 60, in unhide message = bin2str(binary) File "E:\\Python\\Steganography\\main.py", line 16, in bin2str return n.to_bytes((n.bit_length() + 7) // 8, 'big').decode() UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 6: invalid start byte 追溯(最近一次通话):文件“ E:\\ Python \\ Steganography \\ main.py”,行65,在打印中(取消隐藏('coded-img.png'))文件“ E:\\ Python \\ Steganography \\ main。 py”,第60行,在取消隐藏消息中= bin2str(binary)文件“ E:\\ Python \\ Steganography \\ main.py”,第16行,在bin2str中,返回n.to_bytes((n.bit_length()+ 7)// 8 ,'big')。decode()UnicodeDecodeError:'utf-8'编解码器无法解码位置6的字节0x80:无效的起始字节

Here, is my code: 这是我的代码:

from PIL import Image

def str2bin(message):
    binary = bin(int.from_bytes(message.encode('utf-8'), 'big'))
    return binary[2:]

def bin2str(binary):
    n = int(binary, 2)
    return n.to_bytes((n.bit_length() + 7) // 8, 'big').decode()

def hide(filename, message):
    image = Image.open(filename)
    binary = str2bin(message) + '00000000'

    data = list(image.getdata())

    newData = []

    index = 0
    for pixel in data:
        if index < len(binary):
            pixel = list(pixel)
            pixel[0] >>= 2
            pixel[0] <<= 2
            pixel[0] += int('0b' + binary[index:index+2], 2)
            pixel = tuple(pixel)
            index += 2

        newData.append(pixel)

    print(binary)

    image.putdata(newData)
    image.save('coded-'+filename, 'PNG')

def unhide(filename):
    image = Image.open(filename)
    data = image.getdata()

    binary = '0'

    index = 0

    while binary[-8:] != '00000000':
        binary += bin(data[index][0])[-2:]
        index += 1

    binary = binary[:-1]

    print(binary)
    print(index*2)

    message = bin2str(binary)
    return message


hide('img.png', 'alpha.')
print(unhide('coded-img.png'))

Please help. 请帮忙。 Thanks! 谢谢!

There are at least two problems with your code. 您的代码至少存在两个问题。

The first problem is that your encoding can be misaligned by 1 bit since leading null bits are not included in the output of the bin() function: 第一个问题是您的编码可能会错位1位,因为bin()函数的输出中不包括前导空位:

>>> bin(int.from_bytes('a'.encode('utf-8'), 'big'))[2:]
'1100001'
# This string is of odd length and (when joined with the terminating '00000000')
# turns into a still odd-length '110000100000000' which is then handled by your
# code as if there was an extra trailing zero (so that the length is even).
# During decoding you compensate for that effect with the
#
#       binary = binary[:-1]
#
# line. The latter is responsible for your stated problem when the binary
# representation of your string is in fact of even length and doesn't need
# the extra bit as in the below example:
>>> bin(int.from_bytes('.'.encode('utf-8'), 'big'))[2:]
'101110'

You better complement your binary string to even length by prepending an extra null bit (if needed). 您最好在前面加上一个额外的空位(如果需要),将二进制字符串补充为偶数长度。

The other problem is that while restoring the hidden message the stopping condition binary[-8:] == '00000000' can be prematurely satisfied through leading bits of one (partially restored) symbol being concatenated to trailing bits of another symbol. 另一个问题是,在恢复隐藏消息的同时,可以通过将一个(部分恢复的)符号的前导位与另一符号的尾随级联来提前满足停止条件binary[-8:] == '00000000' This can happen, for example, in the following cases 例如,在以下情况下可能会发生这种情况

  • the symbol @ (with ASCII code=64, ie 6 low order bits unset) followed by any character having an ASCII code value less than 64 (ie with 2 highest order bits unset); 符号@ (ASCII码= 64,即未设置6个低阶位),后跟任何ASCII码值小于64(即未设置2个高阶位)的字符;

  • a space character (with ASCII code=32, ie 4 low order bits unset) followed by a linefeed/newline character(with ASCII code=10, ie 4 high order bits unset). 一个空格字符(ASCII码= 32,即未设置4个低位),后跟换行符/换行符(ASCII码= 10,即未设置4个高位)。

You can fix that bug by requiring that a full byte is decoded at the time when the last 8 bits appear to all be unset: 您可以通过要求在最后8位都未设置时对一个完整字节进行解码来解决该错误:

while not (len(binary) % 8 == 0 and binary[-8:] == '00000000'):
    # ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM