How to print non-BMP Unicode characters in Tkinter (e.g. 𝄫)

Question

Note: Non-BMP characters can be displayed in IDLE as of Python 3.8 (so, it's possible Tkinter might display them now, too, since they both use TCL), which was released some time after I posted this question. I plan to edit this after I try out Python 3.9 (after I install an updated version of Xubuntu). I also read the editing these characters in IDLE might not be as straightforward as other characters; see the last comment here .

So, today I was making shortcuts for entering certain Unicode characters. All was going well. Then, when I decided to do these characters (in my Tkinter program; they wouldn't even try to go in IDLE), 𝄫 and 𝄪, I got a strange unexpected error and my program started deleting just about everything I had written in the text box. That's not acceptable.

Here's the error: _tkinter.TclError: character U+1d12b is above the range (U+0000-U+FFFF) allowed by Tcl

I realize most of the Unicode characters I had been using only had four characters in the code. For some reason, it doesn't like five.

So, is there any way to print these characters in a ScrolledText widget (let alone without messing everything else up)?

UTF-8 is my encoding. I'm using Python 3.4 (so UTF-8 is the default).

I can print these characters just fine with the print statement.

Entering the character without just using ScrolledText.insert (eg Ctrl-shift-u , or by doing this in the code: b'\\xf0\\x9d\\x84\\xab' ) does actually enter it, without that error, but it still starts deleting stuff crazily, or adding extra spaces (including itself, although it reappears randomly at times).

Answer 1

There is currently no way to display those characters as they are supposed to look in Tkinter in Python 3.4 (although someone mentioned how using surrogate pairs may work [in Python 2.x]). However, you can implement methods to convert the characters into displayable codes and back, and just call them whenever necessary. You have to call them when you print to Text widgets, copy/paste, in file dialogs*, in the tab bar, in the status bar, and other stuff.

*The default Tkinter file dialogs do not allow for much internal engineering of the dialogs. I made my own file dialogs, partly to help with this issue. Let me know if you're interested. Hopefully I'll post the code for them here in the future.

These methods convert out-of-range characters into codes and vice versa. The codes are formatted with ordinal numbers, like this: {119083ū} . The brackets and the ū are just to distinguish this as a code. {119083ū} represents 𝄫 . As you can see, I haven't yet bothered with a way to escape codes, although I did purposefully try to make the codes very unlikely to occur. The same is true for the ᗍ119083ūᗍ used while converting. Anyway, I'm meaning to add escape sequences eventually. These methods are taken from my class (hence the self ). (And yes, I know you don't have to use semi-colons in Python. I just like them and consider that they make the code more readable in some situations.)

import re;

def convert65536(self, s):
    #Converts a string with out-of-range characters in it into a string with codes in it.
    l=list(s);
    i=0;
    while i<len(l):
        o=ord(l[i]);
        if o>65535:
            l[i]="{"+str(o)+"ū}";
        i+=1;
    return "".join(l);
def parse65536(self, match):
    #This is a regular expression method used for substitutions in convert65536back()
    text=int(match.group()[1:-2]);
    if text>65535:
        return chr(text);
    else:
        return "ᗍ"+str(text)+"ūᗍ";
def convert65536back(self, s):
    #Converts a string with codes in it into a string with out-of-range characters in it
    while re.search(r"{\d\d\d\d\d+ū}", s)!=None:
        s=re.sub(r"{\d\d\d\d\d+ū}", self.parse65536, s);
    s=re.sub(r"ᗍ(\d\d\d\d\d+)ūᗍ", r"{\1ū}", s);
    return s;

Answer 2

My answer is based on @Shule answer but provide more pythnoic and easy to read code. It also provide a real case.

This is the methode populating items to a tkinter.Listbox . There is no back conversion. This solution only take care of displaying strings with Tcl-unallowed characters.

class MyListbox (Listbox):
    # ...
    def populate(self):
        """
        """
        def _convert65536(to_convert):
            """Converts a string with out-of-range characters in it into a
            string with codes in it.

            Based on <https://stackoverflow.com/a/28076205/4865723>.
            This is a workaround because Tkinter (Tcl) doesn't allow unicode
            characters outside of a specific range. This could be emoticons
            for example.
            """
            for character in to_convert[:]:
                if ord(character) > 65535:
                   convert_with = '{' + str(ord(character)) + 'ū}'
                   to_convert = to_convert.replace(character, convert_with)
            return to_convert

        # delete all listbox items
        self.delete(0, END)

        # add items to listbox
        for item in mydata_list:
            try:
                self.insert(END, item)
            except TclError as err:
                _log.warning('{} It will be converted.'.format(err))
                self.insert(END, _convert65536(item))

How to print non-BMP Unicode characters in Tkinter (e.g. 𝄫)

Question

2 answers

solution1
1 ACCPTED 2015-01-21 20:27:18

solution2
0 2018-02-10 08:55:10

How to print non-BMP Unicode characters in Tkinter (e.g. 𝄫)

Question

2 answers

solution1 1 ACCPTED 2015-01-21 20:27:18

solution2 0 2018-02-10 08:55:10

solution1
1 ACCPTED 2015-01-21 20:27:18

solution2
0 2018-02-10 08:55:10