简体   繁体   English

使用python将数据插入redshift

[英]Inserting data to redshift using python

I'm trying to insert multiple rows into amazon redshift database , the rows included in a list of tuples which looks like this: 我正在尝试将多个行插入到Amazon redshift数据库中,该行包含在元组列表中,如下所示:

my_rows=[(1, 0.0, 0, 0.0, 2010188534, 1816780086, 1113834, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17'), (1, 0.0, 1, 0.0, 2010188536, 1816780086, 1119396, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17'), (1, 0.0, 2, 0.0, 2010188538, 1816780086, 1119398, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17'), (1, 0.0, 3, 0.0, 2010188540, 1816780086, 1123612, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17'), (1, 0.5, 0, 0.0, 2010188542, 1816780102, 1086852, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17'), (1, 0.5, 1, 0.0, 2010188544, 1816780102, 1087014, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17'), (1, 0.3, 2, 0.0, 2010188546, 1816780102, 1089224, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17'), (1, 0.3, 3, 0.0, 2010188548, 1816780102, 1089348, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17'), (1, 0.3, 4, 0.0, 2010188550, 1816780102, 1122564, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17')]

Some columns may contain None 有些列可能包含None

I'm inserting them row by row into Redshift database this way: 我以这种方式将它们逐行插入Redshift数据库:

    cur = con.cursor()
    columns_names=("c1","c2","c3","c4","c5","c6","c7","c8","c9","c10")
    insert_reference=len(my_rows[0])*"%s,"
    values_references="("+insert_reference[0:-1]+")"
    for row in my_rows:
      cur = con.cursor()
      insert_query="INSERT INTO "+table+" "+columns_names+" VALUES "+values_references+";"
      cur.execute(insert_query, row)

The problem is that when I run this code, it blocks on the first row without raising any error. 问题是,当我运行此代码时,它在第一行被阻塞而没有引发任何错误。 So, my questions are : Is it normal that it takes so much time to insert one row ? 所以,我的问题是:插入一行这么多的时间是否正常? If not is there some error in my code ? 如果没有,我的代码中是否有错误? Is there some efficient way to that ? 有一些有效的方法吗?

Can i get some help please ? 我可以帮忙吗? Thank you in advance 先感谢您

The process you should follow: 您应遵循的过程:

  1. write your data in csv format to an s3 folder, ideally gzipped 将您的数据以csv格式写入到s3文件夹中,最好将其压缩
  2. run a redshift copy command to import that data into a temporary table in redshift 运行redshift copy命令以将该数据导入redshift中的临时表
  3. run redshift sql to insert that data into your table 运行redshift sql将数据插入表中

That will run fast, is the correct & recommended way and will be scaleable. 这样可以快速运行,是正确且推荐的方法,并且可以扩展。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM