简体   繁体   中英

Why does wx.TextCtrl.SetStyle mishandle emojis?

I'm developing an application using wxPython 4.0.4 in Python 3.7.3, and I've run into a problem when trying to color UTF-8 text in a wx.TextCtrl. Basically, it seems that certain characters are counted incorrectly within wxPython despite them being counted correctly in Python.

I initially thought that it was all multi-byte characters were being miss-counted, however, my example code below shows this is not the case. It appears to be a problem specifically in the wx.TextCtrl.SetStyle function.

import wx
import wx.richtext as rt
app = wx.App()

test_str1 = '''There are no multibyte characters '''
test_str2 = '''blah ble blah\n'''
test_str3 = '''“these are multibyte quotes” '''
test_str4 = '''more single byte chars!\n'''
test_str5 = '''this comma’s represented by multiple bytes\n'''
test_str6 = '''why do emojis 💨 💨 💨 seem to break TextCtrl.SetStyle 💨 💨 💨\n'''
test_str7 = '''more single byte characters\n'''
test_str8 = '''to demonstrate the issue.'''

def main():
    main = TestFrame()
    main.Show()
    app.MainLoop()

class TestFrame(wx.Frame):
    def __init__(self):
        wx.Frame.__init__(self, None, title="TestFrame")
        sizer = wx.BoxSizer(wx.VERTICAL)
        self.panel = TestPanel(self)
        sizer.Add(self.panel, proportion=1, flag=wx.EXPAND)

class TestPanel(wx.Panel):
    def __init__(self, parent):
        wx.Panel.__init__(self, parent)
        self.text = wx.TextCtrl(self, wx.ID_ANY, style=(wx.TE_MULTILINE|wx.TE_RICH|wx.TE_READONLY))
        self.raw_text = ""
        self.styles = []
        self.AddColorText(test_str1, wx.BLUE)
        self.AddColorText(test_str2, wx.RED)
        self.AddColorText(test_str3, wx.BLUE)
        self.AddColorText(test_str4, wx.RED)
        self.AddColorText(test_str5, wx.BLUE)
        self.AddColorText(test_str6, wx.RED)
        self.AddColorText(test_str7, wx.BLUE)
        self.AddColorText(test_str8, wx.RED)
        self.text.SetValue(self.raw_text)
        for s in self.styles:
            self.text.SetStyle(s[0], s[1], s[2])
        sizer = wx.BoxSizer(wx.VERTICAL)
        sizer.Add(self.text, proportion=1, flag=wx.EXPAND)
        self.SetSizer(sizer)

    def AddColorText(self, text, wx_color):
        start = len(self.raw_text)
        self.raw_text += text
        end = len(self.raw_text)
        self.styles.append([start, end, wx.TextAttr(wx_color)])

if __name__ == "__main__":
    main()

在此处输入图像描述

MS Windows uses UTF-16 internally, and before PEP 393 , CPython Unicode was also 16-bit on Windows because of that. But with PEP 393 CPython can now represent all Unicode code points more cleanly, so that one Unicode code point always has a string length of 1.

MSWin, on the other hand, cannot. So wxPython must translate strings into UTF-16 before sending them to the operating system. For everything in the basic multilingual plane of Unicode, which is most of what you'll encounter, that works out fine, because one Unicode code point becomes one UTF-16 character (two bytes).

But those new Emoji's are not in the BMP, so they become more than two bytes in UTF-16. And wxPython fails to account for that: If wxPython passes the start and end counters straight through to an underlying Windows function, then they will be off after an Emoji, because the values your are given count Unicode code points, and the values Windows expects are UTF-16 character counts.

You can work around it computing UTF-16 offsets yourself to pass to SetStyle:

utf16start = len(self.raw_text[:start].encode('utf-16'))
utf16end = utf16start + len(self.raw_text[start:end].encode('utf-16'))

Arguably this is a bug in wxPython, and you should report it to the wxPython issue tracker .

It appears that my issue here was using python standard functions to calculate length instead of the wx library functions. The below code resolves my problem.

import wx
import wx.richtext as rt
app = wx.App()

test_str1 = '''There are no multibyte characters '''
test_str2 = '''blah ble blah\n'''
test_str3 = '''“these are multibyte quotes” '''
test_str4 = '''more single byte chars!\n'''
test_str5 = '''this comma’s represented by multiple bytes\n'''
test_str6 = '''why do emojis 💨 💨 💨 seem to break TextCtrl.SetStyle 💨 💨 💨\n'''
test_str7 = '''more single byte characters\n'''
test_str8 = '''to demonstrate the issue.'''

def main():
    main = TestFrame()
    main.Show()
    app.MainLoop()

class TestFrame(wx.Frame):
    def __init__(self):
        wx.Frame.__init__(self, None, title="TestFrame")
        sizer = wx.BoxSizer(wx.VERTICAL)
        self.panel = TestPanel(self)
        sizer.Add(self.panel, proportion=1, flag=wx.EXPAND)

class TestPanel(wx.Panel):
    def __init__(self, parent):
        wx.Panel.__init__(self, parent)
        self.text = wx.TextCtrl(self, wx.ID_ANY, style=(wx.TE_MULTILINE|wx.TE_RICH|wx.TE_READONLY))
        self.raw_text = ""
        self.styles = []
        self.AddColorText(test_str1, wx.BLUE)
        self.AddColorText(test_str2, wx.RED)
        self.AddColorText(test_str3, wx.BLUE)
        self.AddColorText(test_str4, wx.RED)
        self.AddColorText(test_str5, wx.BLUE)
        self.AddColorText(test_str6, wx.RED)
        self.AddColorText(test_str7, wx.BLUE)
        self.AddColorText(test_str8, wx.RED)
        for s in self.styles:
            self.text.SetStyle(s[0], s[1], s[2])
        sizer = wx.BoxSizer(wx.VERTICAL)
        sizer.Add(self.text, proportion=1, flag=wx.EXPAND)
        self.SetSizer(sizer)

    def AddColorText(self, text, wx_color):
        start = self.text.GetLastPosition()
        self.text.AppendText(text)
        end = self.text.GetLastPosition()
        self.styles.append([start, end, wx.TextAttr(wx_color)])

if __name__ == "__main__":
    main()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM