简体   繁体   中英

How to query unicode database with ascii characters

I am currently running a query on my postgresql database that ignores German characters - umlauts. I however, do not want to loose these characters and would rather have the German characters or at least their equivalent (eg ä = ae) in the output of the query. Running Python 2.7.12

When I change the encode object to replace or xmlcharrefreplace I get the following error:

psycopg2.ProgrammingError: syntax error at or near "?"
LINE 1: ?SELECT

Code Snippet:

# -*- coding: utf-8 -*-

    connection_str = r'postgresql://' + user + ':' + password + '@' + host + '/' + database

    def query_db(conn, sql):
        with conn.cursor() as curs:
            curs.execute(sql)
            rows = curs.fetchall()

        print("fetched %s rows from db" % len(rows))

        return rows

    with psycopg2.connect(connection_str) as conn:
        for filename in files:
            # Read SQL
            sql = u""

            f = codecs.open(os.path.join(SQL_LOC, filename), "r", "utf-8")

            for line in f:
                sql += line.encode('ascii', 'replace').replace('\r\n', ' ')

            rows = query_db(conn, f)

How can I pass a query as a unicode object with German characters ? I also tried decoded the query as utf-8 but then I get the following error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)

Here is a solution to obtain their encoded equivalent. You will be able to re-encode it later and the query will not create an error:

SELECT convert_from(BYTEA 'foo ᚠ bar'::bytea, 'latin-1');
+----------------+
| convert_from   |
|----------------|
| foo á<U+009A>  bar                |
+----------------+
SELECT 1
Time: 0.011s

You just need to conn.set_client_encoding("utf-8") and then you can just execute unicode strings - sql and results will be encoded and decoded on the fly:

$ cat psycopg2-unicode.py
import sys
import os
import psycopg2
import csv

with psycopg2.connect("") as conn:
    conn.set_client_encoding("utf-8")
    for filename in sys.argv[1:]:
        file = open(filename, "r", encoding="utf-8")
        sql = file.read()
        with conn.cursor() as cursor:
            cursor.execute(sql)
            try:
                rows = cursor.fetchall()
            except psycopg2.ProgrammingError as err:
                # No results
                continue
            with open(filename+".out", "w", encoding="utf-8", newline="") as outfile:
                csv.writer(outfile, dialect="excel-tab").writerows(rows)

$ cat sql0.sql
create temporary table t(v) as
    select 'The quick brown fox jumps over the lazy dog.'
    union all
    select 'Zwölf große Boxkämpfer jagen Viktor quer über den Sylter Deich.'
    union all
    select 'Любя, съешь щипцы, — вздохнёт мэр, — кайф жгуч.'
    union all
    select 'Mężny bądź, chroń pułk twój i sześć flag.'
;

$ cat sql1.sql
select * from t;

$ python3 psycopg2-unicode.py sql0.sql sql1.sql

$ cat sql1.sql.out 
The quick brown fox jumps over the lazy dog.
Zwölf große Boxkämpfer jagen Viktor quer über den Sylter Deich.
Любя, съешь щипцы, — вздохнёт мэр, — кайф жгуч.
Mężny bądź, chroń pułk twój i sześć flag.

A Python2 version of this program is a little bit more complicated, as we need to tell the driver that we'd like return values as unicode objects. Also csv module I used for output does not support unicode, so it needs a workaround. Here it is:

$ cat psycopg2-unicode2.py
from __future__ import print_function

import sys
import os
import csv
import codecs

import psycopg2
import psycopg2.extensions
psycopg2.extensions.register_type(psycopg2.extensions.UNICODE)
psycopg2.extensions.register_type(psycopg2.extensions.UNICODEARRAY)

with psycopg2.connect("") as conn:
    conn.set_client_encoding("utf-8")
    for filename in sys.argv[1:]:
        file = codecs.open(filename, "r", encoding="utf-8")
        sql = file.read()
        with conn.cursor() as cursor:
            cursor.execute(sql)
            try:
                rows = cursor.fetchall()
            except psycopg2.ProgrammingError as err:
                # No results from SQL
                continue
            with open(filename+".out", "wb") as outfile:
                for row in rows:
                    row_utf8 = [v.encode('utf-8') for v in row]
                    csv.writer(outfile, dialect="excel-tab").writerow(row_utf8)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM