简体   繁体   English

为什么填充我的桌子需要这么长时间?

[英]Why does populating my table take so long?

I'm loading a csv file into my database via a web form. 我正在通过网络表单将csv文件加载到数据库中。

The order of the raw data is consistent in each csv file, but it changes from file to file, depending on the source, so I have a preview form that shows five rows and allows you to assign a column via a drop-down list of valid column names in the table. 每个csv文件中原始数据的顺序是一致的,但是它会根据源而在文件之间变化,因此我有一个预览表单,该表单显示了五行,并允许您通过下拉列表中的表中的有效列名。

Then I use the cgi form to build an INSERT statement, and parse the csv file line-by-line to populate the table. 然后,我使用cgi表单构建INSERT语句,并逐行分析csv文件以填充表。

But it is running EXTREMELY slow. 但是它运行速度极慢。 I'm concurrently populating two tables, one with 961402 rows (7 columns with values), and the other with 1835538 rows(1 column with values), and each has been running for at least half an hour. 我正在同时填充两个表,一个表具有961402行(带有值的7列),另一个表具有1835538​​行(带有值的1列),每个表已经运行了至少半个小时。 I'm only seeing something like 100 new rows per second. 我只看到每秒100条新行。

Can you see anything here that would slow me down? 您能在这里看到任何会使我慢下来的东西吗?

NOTE: I know there is some ugly code in here, it was one of the first python cgi scripts I wrote while figuring this language out. 注意:我知道这里有一些丑陋的代码,这是我在弄清楚这种语言时编写的第一个python cgi脚本之一。

 for item in form:
          field = form.getvalue(item)
          field = cgi.escape(field)
          if field == 'null':
                  pass
          elif item == 'csvfile':
                  pass
          elif item == 'campaign':
                  pass
          elif item == 'numfields':
                  pass
          else:
                  colname = str(colname) + ", " + str(item)

                  colnum.append(field)
  assert(numfields > 0)
  placeholders = (numfields-1) * "%s, " + "%s"
  query = ("insert into %s (%s)" % (table, colname.lstrip(",")))
  with open(fname, 'rb') as f:
          reader = csv.reader(f)
          try:
                  record = 0
                  errors = 0
                  for row in reader:
                          try:
                                  record = record + 1
                                  data = ''
                                  for value in colnum:
                                          col = int(value)
                                          rawrow = row[col]
                                          saferow = rawrow.replace("'", "-")
                                          saferow = saferow.replace("-", "")
                                          data = str(data) + ", '" + saferow + "'"
                                  dataset = data.lstrip(',')
                                  insert = query + (" values (%s)" % dataset)
                                  cur.execute(insert)
                                  con.commit()
                                  print ".",
                          except IndexError, e:
                                  print "Row:%d file %s, %s<br>" % (reader.line_num, fname.lstrip("./files/"), e)
                                  errors = errors + 1
                          except csv.Error, e:
                                  print "Row:%s file %s, line %d: %s<br>" % (record, fname, reader.line_num, e)
                                  errors = errors + 1
                          except mdb.Error, e:
                                  print "Row:%s Error %d: %s<br>" % (record, e.args[0], e.args[1])
                                  errors = errors + 1
                          except:
                                  t,v,tb = sys.exc_info()
                                  print "Row:%s %s<br>" % (record, v)
                                  errors = errors + 1
          except csv.Error, e:
                  print "except executed<br>"
                  sys.exit('file %s, line %d: %s' % (fname, reader.line_num, e))
  print "Succesfully loaded %s into Campaign %s, <br>" % (fname.lstrip("./files/"), table)
  print record - errors, "new records.<br>"
  print errors, "errors.<br>"

EDIT/UPDATE: Using LOAD DATA LOCAL INFILE worked like a charm, I loaded up 600K records in less than a minute. 编辑/更新:使用LOAD DATA LOCAL INFILE就像一个LOAD DATA LOCAL INFILE一样,我在不到一分钟的时间内加载了60万条记录。

New Code is cleaner, too. 新守则也更干净。

    else:
            colnum.append([field, item])
sortlist =  sorted(colnum, key=itemgetter(0))
cols = ''
for colname in sortlist:
    cols = cols + "%s, " % colname[1]
cur.execute("LOAD DATA LOCAL INFILE '%s' IGNORE INTO TABLE %s FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' (%s)" % (fname, table, cols.rstrip(', ')))
con.commit()

The only catch is that I have to do a smidge more work preparing my csv files to ensure data integrity, otherwise, works like a charm. 唯一的问题是,我必须做更多的工作来准备我的csv文件以确保数据完整性,否则,它就像是一种魅力。

INSERT INTO, done one row at a time, is pretty slow considering that some SQLs, like mySQL, support either having a bunch of rows on a single insert command or LOAD DATA statements that can read CSV files quickly into the server. 考虑到某些SQL(例如mySQL)支持在单个插入命令上包含一排行或可以将CSV文件快速读取到服务器中的LOAD DATA语句,因此一次插入一行的INSERT INTO相当慢。

See also: https://dba.stackexchange.com/questions/16809/why-is-load-data-infile-faster-than-normal-insert-statements 另请参阅: https : //dba.stackexchange.com/questions/16809/why-is-load-data-infile-faster-than-normal-insert-statements

Some quick pseudocode. 一些快速的伪代码。 Do this: 做这个:

for row in data_to_be_inserted:
    stmt = compose_statement("lalala")
    cursor.execute()

connection.commit()

not

for row in data_to_be_inserted:
    stmt = compose_statement("lalala")
    cursor.execute()
    connection.commit()

Your code commit()s once per line of input. 您的代码commit()每行输入一次。 That slows it down significantly. 这大大降低了速度。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为什么我的带有 jit 函数的模块需要这么长时间才能导入? - Why does my module with jit functions take so long to import? 当字典在函数中时,为什么我的 Fibonacci 需要这么长时间? - Why does my Fibonacci take so long when the dictionary is in the function? 为什么 Python 需要这么长时间来处理我的正则表达式? - Why does Python take so long to process my Regex? 为什么这条线需要这么长时间才能运行? - Why does this line take so long to run? 为什么这需要这么长时间才能匹配? 这是一个错误吗? - Why does this take so long to match? Is it a bug? 为什么程序运行需要这么长时间? - Why does the program take so long to run? 为什么导入 openpyxl 需要这么长时间? - Why does importing the openpyxl take so long? OdooV8-为什么在我的情况下在“ res_partner”中创建记录需要这么长时间 - OdooV8 - Why does it take so long to create a record in 'res_partner' in my case 为什么我的多处理代码需要这么长时间才能在 python 3.9 中完成,而不是 python 2.7。 代码改进? - Why does my multiprocessing code take so long to complete in python 3.9, but not python 2.7. Code improvements? 为什么我的代码需要这么长时间才能在Dask Python中写入CSV文件 - Why does my code take so long to write CSV file in Dask Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM