简体   繁体   中英

How can I work around (or with) this function returning hexadecimal characters when I don't want it to? (Python 3)

I've written an encryption function that works by performing an XOR function on a letter in the plaintext and the corresponding letter in the key. See the code below:

def vernam(y):
    ciphertext = ""  # this declares the ciphertext variable
    vernamkey = []
    for letter in y:
        individualletterkey = secrets.choice(string.ascii_letters)  # this generates a different key for each letter
        vernamkey.append(individualletterkey)
        newletter = chr(ord(letter) ^ ord(individualletterkey))
        print(newletter)
        ciphertext += newletter
    for element in vernamkey:  # this loop ensures that the key for every letter is in a text file that can be passed
        # on to the intended recipient for them to decrypt
        vkey.write(str(element))
        vkey.write("\n")
    return ciphertext

While the encrypt function works, for certain unicode characters that pycharm (my IDE) can seemingly not represent, the returned ciphertext has hexadecimal in it:

Enter the message to be encrypted Hello world


8
?
;
l

 
=

6
('\x01\x178?;l\x07\x00=\x0e6')

As you can see, for certain characters in the ciphertext what I'm assuming is a sort of placeeholder is used. These characters are then represented as hexadecimal in the final outputted key at the bottom. This is a problem because I wish to use this key to decrypt this text, and for that to be done one of two things has to happen:

  1. Convert the hexadecimal into a unicode character in the final key. Not sure if that would be wise as multiple different characters will be represented by the same answer

  2. Have the decryption algorithm recognise the hexadecimal characters in the text and convert them into unicode themselves

How would I accomplish either of these?

The core of the problem you describe is your confusion related to variable types in Python and to encoding of texts/strings for storage in a file.

A Python string holds Unicode characters, a byte string holds ASCII code/integers in range(0,255), and so on. Let's put here a bit of Unicode fun from a presentation linked in the comments to your question which I encourage you to read:

ℛℯα∂α♭ℓℯ ♭ʊ☂ η☺т Ѧ$☾ℐℐ ¡ooʇ ןnɟǝsn sı uʍop-ǝpısdn

Once you are clear in mind what do you want to achieve, the confusion will be gone and you can ask the right questions. I suggest you consider to study how to convert between Unicode and bytes and what UTF-8, UTF-16 etc. are.

What you see is not what you have got.

This fact is usually the reason why this issues create so heavy confusion in so many people. For example if you see there is a next line in the text editor you usually don't see if the break of the line consists of two characters (default if you use MS Windows) or only one character (default in Unix/Linux system). The issues related to coding and storing texts in files and viewing the text in a text editor are not trivial and need some deep understanding.

Sorry to say that there is no way around learning how to specify and use encoding for writing and reading from files (except you want always to rely on external help).

Without both the code for encryption and the code for decryption, and both the code for writing to file and reading from file, it would be hard up to impossible to tell if things will work out as expected.

The confusion begins already with the question: How to read and decode text stored in a file into a Python variable? Are there bytes? Are there Unicode UTF-8 or UTF-16 characters stored in the file? Or are code pages used?? Which encoding was used to write out to the file? Which encoding is used to read from the file?

It seems that you are not aware of all this above mentioned issues. But you should, will you understand how to fix them if things go wrong.

A good point to start learning about encoding is to visit this stackoverflow question ( How to know the encoding of a file in Python? [duplicate] ) I found using a search engine and the keywords: 'python file encoding' or this one: What is character encoding and why should I bother with it .

I had been writing here on stackoverflow already on the subject of encoding ( use 'user:7711283 encoding' in the stackoverflow own search for a complete list of 8 results ). Look here ( If you have a string/text in Python (or file) you are never ever would be able to see it 'as it is' ). The better you understand why you never ever would be able to see a string 'as it is' the less you are confused about what you see. Look also here ( there is NO WAY to avoid encoding/decoding but there is a way of doing it in a not explicit way. )

The next step would be to find out which file encoding uses your text editor when it saves or loads a Python script or text to help you with interpretation of what exactly you actually see displayed in the editor. Below a hint where to look for this information: 文本文件编辑器菜单文件中的编码

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM