与python psycopg2相比，使用clojure jdbc将文件记录插入postgres db需要很长时间

Question

I am trying to insert records into postgres DB, and its taking about 3 hours while it takes 40seconds using python psycopg2 and cursor.copy_from method我正在尝试将记录插入 postgres 数据库，它需要大约 3 小时，而使用 python psycopg2 和 cursor.copy_from 方法需要 40 秒

What is wrong with my code, using clojure.java.jdbc/db-do-prepared also takes about 3 hours too.我的代码有什么问题，使用clojure.java.jdbc/db-do-prepared也需要大约 3 个小时。 Please help!请帮忙！

File size is 175M and it has 409,854 records文件大小为 175M，有 409,854 条记录

(defn-
  str<->int [str]
  (let [n (read-string str)]
    (if (integer? n) n)))

(with-open [file (reader "/path/to/foo.txt")]
    (try
      (doseq [v (clojure-csv.core/parse-csv file)]

        (clojure.java.jdbc/insert! db  :records 
                      nil
                      [(v 0) (v 1) (v 2) (str<->int (v 3))]))
      (println "Records inserted successfully")
      (Exception e
        (println (.getNextException e) e))))

Answer 1

It's probably due to not using batching in your Clojure version.这可能是因为在您的 Clojure 版本中没有使用批处理。 You insert rows one by one each triggering the commit.您逐行插入每行触发提交。

If you want to do it in Clojure than you need to partition rows from CSV files and insert!如果您想在 Clojure 中执行此操作，则需要从 CSV 文件中对行进行partition并insert! each chunk as one batch commit.每个块作为一个批处理提交。 You need to use the last arity version accepting multiple col-val-vec s.您需要使用接受多个col-val-vec的最后一个 arity 版本。 Sample code (not checked, just to show the idea):示例代码（未检查，只是为了展示想法）：

(defn row->col-spec [row]
  [(v 0) (v 1) (v 2) (str<->int (v 3))])

(with-open [csv-file (reader "/path/to/foo.txt")]
  (try
    (->> csv-file
         (clojure-csv.core/parse-csv)
         (map row->col-spec)
         (partition 50)
         (map (fn [batch] clojure.java.jdbc/insert! db :records ["col1" "col2" "col3" "col4"] batch))
         (dorun))
    (catch Exception e
      (println e))))

If you don't have to do it in Clojure then using psql 's COPY command seems to be the easiest and fastest option:如果您不必在 Clojure 中执行此操作，那么使用psql的COPY命令似乎是最简单和最快的选项：

COPY records FROM '/path/to/foo.txt' WITH (FORMAT csv, DELIMITER ',',  NULL 'NULL');

Answer 2

After 4 years, decided to come back to this problem and share a guide to the solution, I am sure this will help someone get started. 4 年后，决定回到这个问题并分享解决方案的指南，我相信这会帮助某人开始。

You can take a look at clojure.java.jdbc/insert-multi!你可以看看clojure.java.jdbc/insert-multi！ and edit appropriately to suite the column types in your database并适当编辑以适应数据库中的列类型

(let [from "/path/to/foo.txt"
      to "/path/to/temp/foo.txt"]
  (with-open [reader (io/reader from)
              writer (io/writer to)]
    (doall
      (->> (csv/read-csv reader)
           ;(drop 1)   ;if theres header
           (map #(list (nth % 0 nil) (nth % 2 nil)  (nth % 3 nil)))
           (csv/write-csv writer))))
  (let [fstream (slurp to)
        streamarray (map #(str/split % #",")
                         (str/split-lines fstream))]
    (clojure.java.jdbc/insert-multi! pg-db              ;connection or {:datasource hk-cp}
                                     :tbl_cdrs_da                 ;table name
                                     [:origin_node_type :origin_transaction_id :da_ua_id] ;colums
                                     streamarray)))               ;array

与python psycopg2相比，使用clojure jdbc将文件记录插入postgres db需要很长时间

问题描述

2 个解决方案

解决方案1
3 2016-03-17 11:41:34

解决方案2
1 已采纳 2020-04-20 00:39:04

与python psycopg2相比，使用clojure jdbc将文件记录插入postgres db需要很长时间

问题描述

2 个解决方案

解决方案1 3 2016-03-17 11:41:34

解决方案2 1 已采纳 2020-04-20 00:39:04

解决方案1
3 2016-03-17 11:41:34

解决方案2
1 已采纳 2020-04-20 00:39:04