INSERT上的PostgreSQL性能問題

Question

我在PostgreSQL中創建了一個表，這是定義-

CREATE TABLE "Scratch"
( id uuid NOT NULL,
  text_1 text,
  text_2 text,
  text_3 text,
  text_4 text,
  ts time with time zone,
  CONSTRAINT pk PRIMARY KEY (id)
);

現在，我使用Python程序在text_ *列中插入了一百萬行-2000字節的文本值。 這是我的劇本-

import string
import random
import psycopg2
conn = psycopg2.connect(database="Test",user="postgres",password="postgres",host="localhost",port="5432")
print "connection success"

cur = conn.cursor()
import time
start =  time.time()
for each in range(1000000):
    text_1 = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(2000))
    text_2 = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(2000))
    text_3 = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(2000))
    text_4 = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(2000))

    query = """INSERT INTO "Scratch" (id,text_1,text_2,text_3,text_4,ts) \
          VALUES (uuid_generate_v4(),'{0}','{1}','{2}','{3}',current_timestamp)""".format(text_1, text_2,text_3,text_4);

    cur.execute(query)

conn.commit()

end = time.time()

print end - start
print "Load complete"

插入內容需要-

end - start (22997.991) seconds/ 60 == 384 minutes

我可以使用批量插入來提高插入的性能，也可以增加提交的次數，但是我真正擔心的是在1m行上進行選擇所花費的分鍾數。

現在已經20分鍾了，但我仍然沒有看到這個簡單查詢的結果-

SELECT id, text_1, text_2, text_3, text_4, ts   
  FROM "Scratch";

我確定它會進行全表掃描。

但是如何提高此表的性能...我打算在“ ts”字段上添加索引？ 但是我將如何強制查詢在此簡單查詢上使用此新索引。

什么是正確的方法？

Answer 1

這個評論太長了。

當然，您的查詢正在執行全表掃描。 它返回表中所有行的所有列。 問題可能不是PostgreSQL，而是消耗返回的數據-這是很多數據。

像這樣的簡單查詢可能會幫助您了解性能：

select count(*)
from "Scratch"

甚至類似：

SELECT id, text_1, text_2, text_3, text_4, ts   
FROM "Scratch"
LIMIT 10;

INSERT上的PostgreSQL性能問題

問題描述

1 個解決方案

解決方案1
2 2015-09-20 13:32:32

INSERT上的PostgreSQL性能問題

問題描述

1 個解決方案

解決方案1 2 2015-09-20 13:32:32

解決方案1
2 2015-09-20 13:32:32