简体   繁体   English

在Python 3中删除BMP(表情符号)之外的字符

[英]Remove characters outside of the BMP (emoji's) in Python 3

I have an error: UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 266-266: Non-BMP character not supported in Tk 我有一个错误: UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 266-266: Non-BMP character not supported in Tk

I'm parsing the data, and some emoji's falls to array. 我正在解析数据,然后将一些表情符号分解为数组。 data = 'this variable contains some emoji'sツ😂' I want: data = 'this variable contains some emoji's' data = 'this variable contains some emoji'sツ😂'我想: data = 'this variable contains some emoji's'

How I can remove these characters from my data or handle this situation in Python 3? 如何从数据中删除这些字符或在Python 3中处理这种情况?

If the goal is just to remove all characters above '\￿' , the straightforward approach is to do just that: 如果目标只是删除'\￿'以上'\￿'所有字符,那么直接的方法就是这样做:

data = "this variable contains some emoji'sツ😂"
data = ''.join(c for c in data if c <= '\uFFFF')

It's possible your string is in decomposed form, so you may need to normalize it to composed form first so the non-BMP characters are identifiable: 您的字符串可能是分解形式的,因此您可能需要先将normalize为组合形式,以便可以识别非BMP字符:

import unicodedata

data = ''.join(c for c in unicodedata.normalize('NFC', data) if c <= '\uFFFF')
>>> import string
>>> printable = set(string.printable)
>>> filter(lambda x: x in printable, data)
"this variable contains some emoji's"

For BMP read this: removing emojis from a string in Python 对于BMP,请阅读以下内容: 从Python中的字符串中删除表情符号

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Chromedriver 仅支持 BMP 错误中的字符,同时使用 ChromeDriver Chrome 使用 Selenium Python 将 Emoji 发送到 Tkinter 的 label() 文本框 - Chromedriver only supports characters in the BMP error while sending Emoji with ChromeDriver Chrome using Selenium Python to Tkinter's label() textbox Python,删除UTF8 MySQL DB无法处理的字符,例如表情符号 - Python, Remove characters, such as emoji, that cannot be handled by UTF8 MySQL DB Python 3.4 - 在写入文件时删除或忽略表情符号字符 - Python 3.4 - Remove or ignore emoji characters when writing to file 如何从字符串python中删除所有表情符号(unicode)字符 - How to remove all emoji (unicode) characters from a string python 删除引号python之外的字符 - Remove characters outside quotation mark python Python - 阅读表情符号Unicode字符 - Python - Reading Emoji Unicode Characters 从BeautifulSoup对象中删除非BMP字符 - Remove non BMP characters from BeautifulSoup object python删除unicode str中的IOS Emoji字符,以避免DatabaseError:不正确的字符串值 - python remove IOS Emoji characters in a unicode str , to avoid DatabaseError: Incorrect string value 在 Python 的控制台中显示表情符号 - Display Emoji in Python's console Python:删除第一个花括号外的所有字符 - Python: Remove all characters outside of first curly braces
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM