简体   繁体   English

如何通过 python 雪花连接器处理 Unicode 字符问题,同时从雪花读取数据

[英]How to handle Unicode character issues through python snowflake connector, while reading data from snowflake

While reading data from snowflake using python snowflake connector, I am getting the following error:使用 python 雪花连接器从雪花读取数据时,出现以下错误:

"InterfaceError: 252005: Failed to convert current row, cause: 'utf-8' codec can't decode byte 0xe1 in position 316: invalid continuation byte" “InterfaceError:252005:无法转换当前行,原因:'utf-8'编解码器无法解码 position 316 中的字节 0xe1:无效的继续字节”

The string contains non-UTF-8 characters and the snowflake cursor is unable to return the value.该字符串包含非 UTF-8 字符,并且雪花 cursor 无法返回该值。 How to handle this situation, content is required.这种情况如何处理,内容是必需的。

Python version 3.7.6 Snowflake Python connector 5.5.1 Python 版本 3.7.6 雪花 Python 连接器 5.5.1

Sample Code:示例代码:

import snowflake.connector 

ctx = snowflake.connector.connect(user='user', password='pwd',account='act',warehouse='wrh', database='db', schema='schema', role = 'role' ) 
cur = ctx.cursor() 
cur.arraysize = 10000 
sql = """select longText from db.schema.table where textId = 1279""" 
cur.execute(sql) 
for element in cur: 
   print(element[0])

Sample Data:样本数据:

xxxx: xxxxxxxx@xxxxx.xxx Tx: xttxxh@xxxxx.xxx xx: xuxxxxt: xx: xxx00x0x3 : xxxxxxx0x= // Hxvx x xxxx xx-Xxxxx-xxx thxt xxxt xxxx. xxgx xhxx thxt thx xhxxxx (Uxxxxxxxxxxx) -----xxxgxxxx xxxxxgx----- xxxx: xxxx, xxux x xxV (Ux) [xxxxtx:xxux.x.xxxx.xxv@xxxx.xxx] xxxt: xxxxxxxxy, xxvxxxxx xx, x01x 1x:1x xx Tx: xxxxxáx xx xxxáxxx xx: xxxx, xxux x xxV (Ux) xuxxxxt: xx: xxx00x0x3 : xxxxxxx0x= // Hxvx x xxxx xx-Xxxxx-xxx thxt xxxt xxxx. xxgx xhxx thxt thx xhxxxx (Uxxxxxxxxxxx)

Instead of select longText , try select hex_encode(longText) .而不是select longText ,尝试select hex_encode(longText) This will transform whatever strange binary text lives inside that field into a safe to transport string.这将把该字段内的任何奇怪的二进制文本转换为可安全传输的字符串。 Then you can decode that back within the safety of Python.然后,您可以在 Python 的安全范围内将其解码。

For example, in sql:例如,在 sql 中:

select hex_encode('hello')

That returns 68656C6C6F , which you can decode back in Python:这将返回68656C6C6F ,您可以在 Python 中对其进行解码:

>>> bytes.fromhex('68656C6C6F')
b'hello'

This technique will also give you some insights on what the weird character tripping up your code might be - and what encoding to use.这种技术还将让您深入了解可能导致您的代码绊倒的奇怪字符 - 以及使用什么编码。

在此处输入图像描述

>>> bytes.fromhex('68656C6C6F')
b'hello'
>>> bytes.fromhex('F09F9988')
b'\xf0\x9f\x99\x88'
>>> bytes.fromhex('F09F9988').decode('utf-8')
'🙈'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM