简体   繁体   English

如果使用Python在Postgres中主键或ID相同,如何附加值

[英]How to append value if Primary key or id is same in postgres using python

I'm trying to insert around 50 millions of data into postgresql using python script. 我正在尝试使用python脚本将大约5000万数据插入到postgresql中。 I've file which contains 50 millions records. 我的文件包含5000万条记录。 I'm completely new to PostgreSQL and Python as well. 我也是PostgreSQL和Python的新手。 I tried below code to insert in python and I'm facing one challenge here. 我尝试了下面的代码插入python,在这里我面临一个挑战。 My test.txt contains a key-value pair like below. 我的test.txt包含一个键值对,如下所示。

If same key appears twice in the text file, I want to append the value with existing one. 如果相同的键在文本文件中出现两次,我想将值附加到现有的键上。 Which I'm not sure how to do that in python. 我不确定如何在python中做到这一点。 Can you please some one help? 你能帮个忙吗?

myfile.txt myfile.txt文件

key1 item1,product1,model1,price1|
key2 item2,product2,model2,price2|
key3 item3,product3,model3,price3|
key4 item4,product4,model4,price4|
key2 item22,product22,model22,price22|

In this case key2 has two records - while inserting into DB I've to append the second value with first one. 在这种情况下,key2有两个记录-在插入数据库时​​,我必须在第二个值后面附加第一个值。

Tabular column: 表格列:

key  value
key1 item1,product1,model1,price1|
key2 item2,product2,model2,price2|item22,product22,model22,price22|
key3 item3,product3,model3,price3|
key4 item4,product4,model4,price4|

insert.py insert.py

import psycopg2

def insertToDB(fileName):
  conn = psycopg2.connect("dbname='mydb' user='testuser' host='localhost'")
  with open(fileName) as f:
     for line in f:
       k,v = line.split(' ',1)
       cursor = conn.cursor()
       query = "INSERT INTO mytable (key,value) VALUES (%s,%s);"
       data = (key,value)
       cursor.execute(query,data)
       conn.commit()

 insertfile('myfile.txt')

I've around 50 millions of data and most of the key might have repeated key with different record, how to handle that and how efficiently we can write into DB? 我拥有大约5000万个数据,并且大多数键可能具有重复的键并具有不同的记录,该如何处理以及如何有效地写入DB?

It would be really helpful if someone can suggest to improvise this? 如果有人可以建议即兴创作,这真的有帮助吗?

Thank you! 谢谢!

The easiest way is to use the ON CONFLICT clause of the SQL insert statement. 最简单的方法是使用SQL插入语句的ON CONFLICT子句。 This changes your simple insert into a "upsert" (insert or update). 这会将您的简单插入内容更改为“ upsert”(插入或更新)。

ON CONFLICT requires PostgreSQL version 9.5 or greater, and is used like this: ON CONFLICT需要PostgreSQL 9.5或更高版本,其使用方式如下:

query = """INSERT INTO mytable (key,value)
           VALUES (%s,%s)
           ON CONFLICT (key)
           DO UPDATE SET value = CONCAT(users.value, %s);"""
cursor.execute(query, (key, value, value))

The other option is to concatenate your results before you send them to the database by refactoring your data. 另一种选择是通过重构数据将结果连接到数据库之前,将它们串联起来。 Here I am collecting all rows by key in a dictionary, and then when inserting I'll just join all the values together. 在这里,我将按字典中的键收集所有行,然后在插入时将所有值连接在一起。

This way, you only have one insert for each key. 这样,每个密钥只有一个插入。

Here is some code to explain this: 这是一些代码来解释这一点:

from collections import defaultdict
import psycopg2

def get_records(filename):
   records = defaultdict(list)
   with open(filename) as f:
     for line in f:
        if line.strip():
          key, value = line.split(' ',1)
          records[key].append(value)
   return records

def insert_records(records, conn):
   q = "INSERT INTO mytable (key, value) VALUES (%s, %s);"
   cursor = conn.cursor()
   for key, data in records.items():
      cursor.execute(q, (key, ''.join(data)))
      conn.commit()

conn = psycopg2.connect("dbname='mydb' user='testuser' host='localhost'")
insert_records(get_records('myfile.txt'), conn)

If you have a very large number of records, it may be that your are exhausting the memory by loading the entire file at once. 如果您有大量的记录,则可能是因为您一次加载了整个文件而耗尽了内存。

Instead, you can implement a simpler algorithm that keeps track of keys that are read. 相反,您可以实现更简单的算法来跟踪读取的密钥。

def insert_records(filename, conn):
   seen = set()
   cursor = conn.cursor()
   qi = "INSERT INTO mytable (key, value) VALUES (%s, %s);"
   qu = "UPDATE mytable SET value = CONCAT(value, %s) WHERE key = %s;"

   with open(filename) as f:
     for line in f:
       if line.strip():
         key, value = line.split(' ', 1)
         if key not in seen:
            # first time we see this key, do an insert
            seen.add(key)
            cursor.execute(qi, (key, value))
         else:
            # key has been processed at least once, do an update
            cursor.execute(qu, (value, key))

         conn.commit()

conn = psycopg2.connect("dbname='mydb' user='testuser' host='localhost'")
insert_records(filename, conn)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 python df._topostgis 在现有 postgres 表中使用递增主键的 Append 数据? - How to Append data in existing postgres table with incrementing primary key using python df._topostgis? Postgres:使用python在postgres中自动生成主键 - Postgres: autogenerate primary key in postgres using python 如何使用管道定界符将值附加到python中的相同键 - How to append value to the same key in python using pipe delimiter 如何在python字典中的同一个键上附加一个值? - How to append a value on the same key in dictionary of python? 如何使用python将多个键和值附加到嵌套字典中? - How to append multiple key and value into a nested dictionary using python? 如何在Python 3的同一个键中附加多个值? - How to append multiple values in the same key in Python 3? 如何从字典中的字典迭代和 append 键值对并将具有相同键的值存储到列表 python - How to iterate and append key value pairs from a dictionary within a dicionary and storing values with same key into a list python 如果键值对相同​​,Python Dict 附加值 - Python Dict append value if key value pair are same append(键,值)如何在 python 上使用循环 - how append (key,value) with loop on python 如何在Python dict的值前附加键? - How to append a key before a value in Python dict?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM