创建和填充表既缓慢又不稳定

Question

When using csv_copy to create/populate a table, I notice it is extremely slow sometimes. 使用csv_copy创建/填充表时，我注意到有时它非常慢。 The following are the core code and some sample outputs. 以下是核心代码和一些示例输出。

I have two questions: 我有两个问题：

I can't figure out why the time varies for creating and populating tables. 我不知道为什么创建和填充表的时间会有所不同。
I am not sure what caused the "none" to be printed. 我不确定是什么导致“无”被打印。

Code: 码：

def create_populate_table(table_name,fields,types,cur):
    sql = 'CREATE TABLE IF NOT EXISTS ' + table_name + ' (\n'
    for i in xrange(len(fields)):
        if i==0:
            sql += fields[i]+' '+types[i]+' NOT NULL PRIMARY KEY,\n'
        elif i==len(fields)-1:
            sql += fields[i]+' '+types[i]+')'
        else:
            sql += fields[i]+' '+types[i]+',\n'
    #print sql
    cur.execute(sql)
    conn.commit()
    print "Table ",table_name," created ",timer()

    cur.execute("SELECT count(*) from "+table_name)
    if cur.fetchone()[0]>0:
        return
    # populate data into created table
    fr= open(file, 'r')
    fr.readline()
    # parse and convert data into unicode
    #data = unicode_csv_reader(fr, delimiter='|')
    # anything can be used as a file if it has .read() and .readline() methods
    data = StringIO.StringIO()
    s=''.join(fr.readlines())
    while(s.find('\r\n')<>-1):
        s=s.replace('\r\n','\n')
    #timer()
    while(s.find('||')<>-1 or s.find('|\n')<>-1 ):
        s=s.replace('||','|0|')
        s=s.replace('|\n','|0\n')
    #timer()
    #print s.split('\t')[:2]
    #exit(0)
    data.write(s)
    data.seek(0)
    try:
        cur.copy_from(data, table_name,sep='|')
        conn.commit()
        print "Table ",table_name," populated ",timer()
    except psycopg2.DatabaseError, e:
        if conn:
            conn.rollback()
        print 'Error %s' % e    
    fr.close()

The outputs I see: 我看到的输出：

ME_Features_20121001.txt Table ME_Features_20121001 created 1.44s None Table ME_Features_20121001 populated 1.48s None ME_Features_20121001.txt已创建表ME_Features_20121001 1.44s无已填充表ME_Features_20121001 1.48s无

FM_Features_20121001.txt Table FM_Features_20121001 created 67.92s None Table FM_Features_20121001 populated 0.22s None FM_Features_20121001.txt已创建表FM_Features_20121001 67.92s无已填充表FM_Features_20121001 0.22s无

NationalFile_20121001.txt (700mb) Table NationalFile_20121001 created 9.34s None Table NationalFile_20121001 populated 4963.18s None NationalFile_20121001.txt（700mb）表NationalFile_20121001创建了9.34s无表NationalFile_20121001填充了4963.18s无

NJ_Features_20121001.txt Table NJ_Features_20121001 created 1.65s None Table NJ_Features_20121001 populated 41.11s None NJ_Features_20121001.txt已创建表NJ_Features_20121001 1.65s无已填充表NJ_Features_20121001 41.11s无

PW_Features_20121001.txt Table PW_Features_20121001 created 1.73s None Table PW_Features_20121001 populated 0.20s None PW_Features_20121001.txt已创建表PW_Features_20121001 1.73s无已填充表PW_Features_20121001 0.20s无

Answer 1

How is timer() defined? 如何定义timer() ？ My blind guess (as you didn't provide its code) is that this function calls print directly to output the measured time, but doesn't return anything explicitly - hence None is printed. 我的盲目猜测（因为您未提供其代码）是该函数直接调用print来输出测量的时间，但没有明确返回任何内容-因此， None打印任何内容。 If it's still unclear, look at the example below: 如果仍然不清楚，请查看以下示例：

>>> def test():
...     print 'test'
... 
>>> print 'This is a', test()
This is a test
None

I'm not sure what you mean saying that the time varies for creating and populating tables . 我不确定您的意思是创建和填充表的时间会有所不同 。 Time needed to populate the table depends on the amount of data to insert, obviously. 显然，填充表所需的时间取决于要插入的数据量。 Time needed to create a table should be more or less the same in each case, so the 67.92s output looks suspicious indeed, but... are you sure it's measured properly? 在每种情况下，创建表所需的时间应大致相同，因此67.92s输出确实看起来可疑，但是...您确定测量正确吗？

Again, my blind guess is that timer() prints the time since last call. 同样，我的盲目猜测是timer()打印自上次调用以来的时间。 Perhaps you should explicitly reset it before starting the operation you want to measure? 也许您应该在开始要测量的操作之前明确重置它？ I guess that those 60 seconds were spent before calling create_populate_table() ... 我猜那是花了60秒钟才调用create_populate_table() ...

创建和填充表既缓慢又不稳定

问题描述

1 个解决方案

解决方案1
1 2012-11-06 19:25:39

创建和填充表既缓慢又不稳定

问题描述

1 个解决方案

解决方案1 1 2012-11-06 19:25:39

解决方案1
1 2012-11-06 19:25:39