简体   繁体   English

创建和填充表既缓慢又不稳定

[英]Creating and populating a table is slow and unstable

When using csv_copy to create/populate a table, I notice it is extremely slow sometimes. 使用csv_copy创建/填充表时,我注意到有时它非常慢。 The following are the core code and some sample outputs. 以下是核心代码和一些示例输出。

I have two questions: 我有两个问题:

  1. I can't figure out why the time varies for creating and populating tables. 我不知道为什么创建和填充表的时间会有所不同。
  2. I am not sure what caused the "none" to be printed. 我不确定是什么导致“无”被打印。

Code: 码:

def create_populate_table(table_name,fields,types,cur):
    sql = 'CREATE TABLE IF NOT EXISTS ' + table_name + ' (\n'
    for i in xrange(len(fields)):
        if i==0:
            sql += fields[i]+' '+types[i]+' NOT NULL PRIMARY KEY,\n'
        elif i==len(fields)-1:
            sql += fields[i]+' '+types[i]+')'
        else:
            sql += fields[i]+' '+types[i]+',\n'
    #print sql
    cur.execute(sql)
    conn.commit()
    print "Table ",table_name," created ",timer()

    cur.execute("SELECT count(*) from "+table_name)
    if cur.fetchone()[0]>0:
        return
    # populate data into created table
    fr= open(file, 'r')
    fr.readline()
    # parse and convert data into unicode
    #data = unicode_csv_reader(fr, delimiter='|')
    # anything can be used as a file if it has .read() and .readline() methods
    data = StringIO.StringIO()
    s=''.join(fr.readlines())
    while(s.find('\r\n')<>-1):
        s=s.replace('\r\n','\n')
    #timer()
    while(s.find('||')<>-1 or s.find('|\n')<>-1 ):
        s=s.replace('||','|0|')
        s=s.replace('|\n','|0\n')
    #timer()
    #print s.split('\t')[:2]
    #exit(0)
    data.write(s)
    data.seek(0)
    try:
        cur.copy_from(data, table_name,sep='|')
        conn.commit()
        print "Table ",table_name," populated ",timer()
    except psycopg2.DatabaseError, e:
        if conn:
            conn.rollback()
        print 'Error %s' % e    
    fr.close()  

The outputs I see: 我看到的输出:

ME_Features_20121001.txt Table ME_Features_20121001 created 1.44s None Table ME_Features_20121001 populated 1.48s None ME_Features_20121001.txt已创建表ME_Features_20121001 1.44s无已填充表ME_Features_20121001 1.48s无

FM_Features_20121001.txt Table FM_Features_20121001 created 67.92s None Table FM_Features_20121001 populated 0.22s None FM_Features_20121001.txt已创建表FM_Features_20121001 67.92s无已填充表FM_Features_20121001 0.22s无

NationalFile_20121001.txt (700mb) Table NationalFile_20121001 created 9.34s None Table NationalFile_20121001 populated 4963.18s None NationalFile_20121001.txt(700mb)表NationalFile_20121001创建了9.34s无表NationalFile_20121001填充了4963.18s无

NJ_Features_20121001.txt Table NJ_Features_20121001 created 1.65s None Table NJ_Features_20121001 populated 41.11s None NJ_Features_20121001.txt已创建表NJ_Features_20121001 1.65s无已填充表NJ_Features_20121001 41.11s无

PW_Features_20121001.txt Table PW_Features_20121001 created 1.73s None Table PW_Features_20121001 populated 0.20s None PW_Features_20121001.txt已创建表PW_Features_20121001 1.73s无已填充表PW_Features_20121001 0.20s无

How is timer() defined? 如何定义timer() My blind guess (as you didn't provide its code) is that this function calls print directly to output the measured time, but doesn't return anything explicitly - hence None is printed. 我的盲目猜测(因为您未提供其代码)是该函数直接调用print来输出测量的时间,但没有明确返回任何内容-因此, None打印任何内容。 If it's still unclear, look at the example below: 如果仍然不清楚,请查看以下示例:

>>> def test():
...     print 'test'
... 
>>> print 'This is a', test()
This is a test
None

I'm not sure what you mean saying that the time varies for creating and populating tables . 我不确定您的意思是创建和填充表的时间会有所不同 Time needed to populate the table depends on the amount of data to insert, obviously. 显然,填充表所需的时间取决于要插入的数据量。 Time needed to create a table should be more or less the same in each case, so the 67.92s output looks suspicious indeed, but... are you sure it's measured properly? 在每种情况下,创建表所需的时间应大致相同,因此67.92s输出确实看起来可疑,但是...您确定测量正确吗?

Again, my blind guess is that timer() prints the time since last call. 同样,我的盲目猜测是timer()打印自上次调用以来的时间。 Perhaps you should explicitly reset it before starting the operation you want to measure? 也许您应该在开始要测量的操作之前明确重置它? I guess that those 60 seconds were spent before calling create_populate_table() ... 我猜那是花了60秒钟才调用create_populate_table() ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM