简体   繁体   English

Python 3-CSV和cx_Oracle

[英]Python 3 - CSV and cx_Oracle

I'm having some serious trouble working with the csv and cx_oracle module. 我在使用csv和cx_oracle模块时遇到一些严重的麻烦。 I want to read a csv file, that is saved in UTF-8 (I checked it by saving it with Notepad in UTF-8). 我想读取保存在UTF-8中的csv文件(我通过使用记事本将其保存在UTF-8中进行了检查)。 I can read everything fine now (before I saved it as UTF-8 it didn't). 我现在可以阅读一切正常(在将其保存为UTF-8之前,还没有)。 This is my code to read the csv-file: 这是我读取csv文件的代码:

 with open(file, 'rt', encoding='utf-8') as csvfile:
    csvinput = csv.reader(csvfile, delimiter = ',', quotechar = '"')
    for row in csvinput:
        data.append(row)

This saves everything to a 2D array. 这会将所有内容保存到2D数组中。 Whenever I want to insert something into the database, I make a preparedstatement, and load the text into it as such: 每当我想在数据库中插入一些内容时,我都会准备一个preparedstatement,然后像这样将文本加载到其中:

data = [lastname, firstname]
cursor = cx_Oracle.Cursor(connection)
cursor.prepare("SELECT * FROM PRIVATE WHERE NAME = :1 AND FIRSTNAME = :2")
cursor.execute(None, data)
res = cursor.fetchall()
cursor.close()

It gives me tons of errors like: 它给了我很多错误,例如:

UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 1: ordinal not in range(128)

I tried reading up on the whole thing, but I got rather confused with the unicode thing as I don't really know where I should use what and why... Any help is appreciated. 我尝试阅读全部内容,但是我对unicode感到很困惑,因为我真的不知道我应该在哪里使用什么以及为什么使用...任何帮助都是值得的。 TLDR I get encoding errors whilst trying to execute prepared statements TLDR我在尝试执行准备好的语句时遇到编码错误

You are trying to insert Unicode values into a VARCHAR2 column, which can only handle encoded byte strings. 您试图将Unicode值插入到VARCHAR2列中,该列只能处理编码的字节字符串。

cx_Oracle is trying to encode your Unicode values for you to fit the column type, and does so with the default codec for your connection. cx_Oracle尝试为您编码Unicode值以适合列类型,并使用用于连接的默认编解码器进行编码。

Either encode your values to a suitable encoding manually or make your columns use NVARCHAR2 instead. 可以手动将值编码为合适的编码,或者使列使用NVARCHAR2代替。

The latter has the added advantage that column lengths are expressed in characters , not bytes; 后者具有附加的优势,即列长度用字符而不是字节表示; UTF-8 data can use up to 4 bytes per character, so a VARCHAR2(1000) column could, in a worst-case scenario, fit only 250 actual characters. UTF-8数据每个字符最多可以使用4个字节,因此在最坏的情况下, VARCHAR2(1000)列只能容纳250个实际字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM