简体   繁体   English

如何在 Tkinter 中打印非 BMP Unicode 字符(例如 𝄫)

[英]How to print non-BMP Unicode characters in Tkinter (e.g. 𝄫)

Note: Non-BMP characters can be displayed in IDLE as of Python 3.8 (so, it's possible Tkinter might display them now, too, since they both use TCL), which was released some time after I posted this question.注意:Python 3.8 开始,非 BMP 字符可以在 IDLE 中显示(因此,Tkinter 现在也可能显示它们,因为它们都使用 TCL),这是在我发布此问题后一段时间发布的。 I plan to edit this after I try out Python 3.9 (after I install an updated version of Xubuntu).我计划在试用 Python 3.9 之后(在我安装了 Xubuntu 的更新版本之后)进行编辑。 I also read the editing these characters in IDLE might not be as straightforward as other characters;我也读过在 IDLE 中编辑这些字符可能不像其他字符那么简单; see the last comment here .请参阅此处的最后一条评论


So, today I was making shortcuts for entering certain Unicode characters.所以,今天我正在制作输入某些 Unicode 字符的快捷方式。 All was going well.一切都很顺利。 Then, when I decided to do these characters (in my Tkinter program; they wouldn't even try to go in IDLE), 𝄫 and 𝄪, I got a strange unexpected error and my program started deleting just about everything I had written in the text box.然后,当我决定做这些字符时(在我的 Tkinter 程序中;他们甚至不会尝试进入 IDLE)、𝄫 和 𝄪,我遇到了一个奇怪的意外错误,我的程序开始删除我在文本框。 That's not acceptable.那是不能接受的。

Here's the error: _tkinter.TclError: character U+1d12b is above the range (U+0000-U+FFFF) allowed by Tcl这是错误: _tkinter.TclError: character U+1d12b is above the range (U+0000-U+FFFF) allowed by Tcl

I realize most of the Unicode characters I had been using only had four characters in the code.我意识到我使用的大多数 Unicode 字符在代码中只有四个字符。 For some reason, it doesn't like five.出于某种原因,它不喜欢五个。

So, is there any way to print these characters in a ScrolledText widget (let alone without messing everything else up)?那么,有没有办法在 ScrolledText 小部件中打印这些字符(更不用说不搞乱其他所有内容了)?

UTF-8 is my encoding. UTF-8 是我的编码。 I'm using Python 3.4 (so UTF-8 is the default).我使用的是 Python 3.4(所以 UTF-8 是默认的)。

I can print these characters just fine with the print statement.我可以用打印语句很好地打印这些字符。

Entering the character without just using ScrolledText.insert (eg Ctrl-shift-u , or by doing this in the code: b'\\xf0\\x9d\\x84\\xab' ) does actually enter it, without that error, but it still starts deleting stuff crazily, or adding extra spaces (including itself, although it reappears randomly at times).输入字符而不只是使用 ScrolledText.insert(例如Ctrl-shift-u ,或通过在代码中执行此操作: b'\\xf0\\x9d\\x84\\xab' )确实输入了它,没有那个错误,但它仍然开始疯狂地删除东西,或添加额外的空格(包括它自己,尽管它有时会随机重新出现)。

There is currently no way to display those characters as they are supposed to look in Tkinter in Python 3.4 (although someone mentioned how using surrogate pairs may work [in Python 2.x]).目前没有办法显示这些字符,因为它们应该在 Python 3.4 中的 Tkinter 中查看(尽管有人提到使用代理对可能如何工作 [在 Python 2.x] 中)。 However, you can implement methods to convert the characters into displayable codes and back, and just call them whenever necessary.但是,您可以实现将字符转换为可显示代码并返回的方法,并在必要时调用它们。 You have to call them when you print to Text widgets, copy/paste, in file dialogs*, in the tab bar, in the status bar, and other stuff.当您打印到文本小部件、复制/粘贴、文件对话框*、标签栏、状态栏和其他东西时,您必须调用它们。

*The default Tkinter file dialogs do not allow for much internal engineering of the dialogs. *默认的 Tkinter 文件对话框不允许对对话框进行很多内部工程。 I made my own file dialogs, partly to help with this issue.我制作了自己的文件对话框,部分是为了帮助解决这个问题。 Let me know if you're interested.如果您有兴趣,请告诉我。 Hopefully I'll post the code for them here in the future.希望我将来会在这里发布他们的代码。

These methods convert out-of-range characters into codes and vice versa.这些方法将超出范围的字符转换为代码,反之亦然。 The codes are formatted with ordinal numbers, like this: {119083ū} .代码采用序数格式,如下所示: {119083ū} The brackets and the ū are just to distinguish this as a code.括号和ū只是为了将其区分为代码。 {119083ū} represents 𝄫 . {119083ū}代表𝄫 As you can see, I haven't yet bothered with a way to escape codes, although I did purposefully try to make the codes very unlikely to occur.正如您所看到的,我还没有为转义代码而烦恼,尽管我确实有目的地尝试使代码不太可能发生。 The same is true for the ᗍ119083ūᗍ used while converting.转换时使用的ᗍ119083ūᗍ也是如此。 Anyway, I'm meaning to add escape sequences eventually.无论如何,我的意思是最终添加转义序列。 These methods are taken from my class (hence the self ).这些方法取自我的班级(因此是self )。 (And yes, I know you don't have to use semi-colons in Python. I just like them and consider that they make the code more readable in some situations.) (是的,我知道您不必在 Python 中使用分号。我只是喜欢它们并认为它们在某些情况下使代码更具可读性。)

import re;

def convert65536(self, s):
    #Converts a string with out-of-range characters in it into a string with codes in it.
    l=list(s);
    i=0;
    while i<len(l):
        o=ord(l[i]);
        if o>65535:
            l[i]="{"+str(o)+"ū}";
        i+=1;
    return "".join(l);
def parse65536(self, match):
    #This is a regular expression method used for substitutions in convert65536back()
    text=int(match.group()[1:-2]);
    if text>65535:
        return chr(text);
    else:
        return "ᗍ"+str(text)+"ūᗍ";
def convert65536back(self, s):
    #Converts a string with codes in it into a string with out-of-range characters in it
    while re.search(r"{\d\d\d\d\d+ū}", s)!=None:
        s=re.sub(r"{\d\d\d\d\d+ū}", self.parse65536, s);
    s=re.sub(r"ᗍ(\d\d\d\d\d+)ūᗍ", r"{\1ū}", s);
    return s;

My answer is based on @Shule answer but provide more pythnoic and easy to read code.我的回答基于@Shule 的回答,但提供了更多 pythnoic 和易于阅读的代码。 It also provide a real case.它还提供了一个真实案例。

This is the methode populating items to a tkinter.Listbox .这是将项目填充到tkinter.Listbox的方法。 There is no back conversion.没有反向转换。 This solution only take care of displaying strings with Tcl-unallowed characters.此解决方案只负责显示带有 Tcl 不允许的字符的字符串。

class MyListbox (Listbox):
    # ...
    def populate(self):
        """
        """
        def _convert65536(to_convert):
            """Converts a string with out-of-range characters in it into a
            string with codes in it.

            Based on <https://stackoverflow.com/a/28076205/4865723>.
            This is a workaround because Tkinter (Tcl) doesn't allow unicode
            characters outside of a specific range. This could be emoticons
            for example.
            """
            for character in to_convert[:]:
                if ord(character) > 65535:
                   convert_with = '{' + str(ord(character)) + 'ū}'
                   to_convert = to_convert.replace(character, convert_with)
            return to_convert

        # delete all listbox items
        self.delete(0, END)

        # add items to listbox
        for item in mydata_list:
            try:
                self.insert(END, item)
            except TclError as err:
                _log.warning('{} It will be converted.'.format(err))
                self.insert(END, _convert65536(item))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Java对正则表达式库中的非BMP Unicode字符(即代码点&gt; 0xFFFF)的支持? - Java support for non-BMP Unicode characters (i.e. codepoints > 0xFFFF) in their Regular Expression Library? Java XMLStreamWriter:输出Unicode扩展字符(非BMP) - Java XMLStreamWriter: Outputting Unicode extended characters (non-BMP) 如何输入非BMP unicode(超过4个字符的十六进制)作为Mathematica的输入 - How to enter non-BMP unicode (hexadecimal with more than 4 characters) as input to Mathematica 如何在Windows cmd上将不支持的unicode字符打印为“?”而不是引发异常? - How to print unsupported unicode characters on Windows cmd as e.g. “?” instead of raising exception? 非 BMP 平面字符的 Unicode 转义序列 - Unicode escape sequence for non-BMP plane character 使用非 BMP 字符引发错误会重新启动 shell - Raising error with non-BMP characters restarts shell 如何正确处理R中的转义Unicode字符,例如em破折号( - ) - How to correctly deal with escaped Unicode Characters in R e.g. the em dash (—) unicode字符的换行行为(例如🦄)? - Line breaking behaviour of unicode characters (e.g. 🦄)? PHP-将Unicode转换为CSV,例如汉字 - PHP - Convert Unicode for CSV, e.g. chinese characters Python:从非BMP Unicode字符中查找等效的代理对 - Python: Find equivalent surrogate pair from non-BMP unicode char
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM