简体   繁体   中英

Python PostgreSQL CSV Import empty field sorter

i am trying to import a quite big csv file (21 columns / 125k rows) into Postgres. Since you cannot insert an empty string into Postgres like with Sqlite. I am trying to sort through each row with a csvDictReader and filter the data in order to create an Insert statement for the columns/fields with data. The sorting works well but when i try to create the insert statement it tries to insert the array instead of each value... Please don´t suggest other ways like Postgresql´s copy etc. Thank you

with codecs.open(filename, 'rb', encoding='utf-8') as csvfile:
    reader = csv.DictReader(csvfile, delimiter='\t')
    a=0
    col=[]
    val=[]
    for row in reader:
        if a>0:
            for column, value in row.items():
                if value != '':
                    #print column, value
                    col.append(column)
                    val.append(value)
                        try:
                            c.execute('''INSERT INTO AMA (%s) VALUES (%s) ON CONFLICT DO NOTHING''',(col,val,))
                        except psycopg2.IntegrityError as e:
                            print e
                            
                        col=[]
                        val=[]                          
                    a=a+1

psycopg2.ProgrammingError: syntax error at or near "ARRAY" LINE 1: INSERT INTO AMA (ARRAY['fulfillment-id', 'sku', 'settleme...

Managed to come this far, but now a different problem:

with codecs.open(filename, 'rb', encoding='utf-8') as csvfile:
    reader = csv.DictReader(csvfile, delimiter='\t')
    a=0
    col=[]
    val=[]
    for row in reader:
        for column, value in row.items():
            if value != '':
                col.append(column)
                val.append(value)
                try:
                    query='''INSERT INTO AMA %s VALUES %s ON CONFLICT DO NOTHING'''
                    print c.mogrify(query, (tuple(col), tuple(val)))
                    c.execute(query, (tuple(col), tuple(val),))
                            
                except psycopg2.IntegrityError as e:
                    print e
                            
                col=[]
                val=[]                          
        a=a+1

psycopg2.ProgrammingError: syntax error at or near "'currency'" LINE 1: INSERT INTO AMA ('currency', 'settlement-id', 'deposit-da

It looks like i need " " around the column names in PostgreSQL instead of '' . What i can do to change that?

Found a solution that works, if there are any ideas of how to increase the speed, like execute many etc. please let me know..

with codecs.open(filename, 'rb', encoding='utf-8') as csvfile:
    reader = csv.DictReader(csvfile, delimiter='\t')
        a=0
        col=[]
        val=[]
            for row in reader:
                for column, value in row.items():
                    if value != '':
                        col.append(column)
                        val.append(unicode(value, "utf8"))
                        try:
                            query1=sql.SQL("INSERT INTO AMA ({}) VALUES ({}) ON CONFLICT DO NOTHING").format(sql.SQL(', ').join(map(sql.Identifier, col)),sql.SQL(', ').join(sql.Placeholder() * len(col)))
                            query=c.mogrify(query1, tuple(val),)
                            #print query                    
                            c.execute(query)
                            
                       except psycopg2.IntegrityError as e:
                            print e
                            
                        col=[]
                        val=[]                          
                    a=a+1

使用 psycopg2 的 SQL 函数,可以创建正确格式的 sql,表在“”中,值在 '' 中,然后只需使用具有数据的表和值创建列表即可。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM