将非规范化数据加载到数据库中

Question

I have a database (postgres) with two tables: 我有一个带有两个表的数据库（postgres）：

CREATE TABLE invoices (
    id bigint,
    some_data varchar
)

CREATE TABLE charges (
    id bigint,
    invoice_id bigint,
    some_data varchar
)

I'm trying to load a csv file with the following format into this database: 我正在尝试将具有以下格式的csv文件加载到该数据库中：

invoice_id, invoice_data, charge_id, charge_data

For example, I could have the following lines in my csv file: 例如，我的csv文件中可能包含以下几行：

1, $10.00, 1, $2.00
1, $10.00, 2, $5.00
1, $10.00, 3, $3.00
2, $2.00,  4, $1.00
2, $2.00,  5, $1.00
3, $11.00, 6, $11.00

This data should correspond to the following records in the database: 此数据应对应于数据库中的以下记录：

SELECT * FROM invoices;
  id | some_data
-----+-------------
  1  | $10.00
  2  | $2.00
  3  | $11.00

SELECT * FROM charges;
  id | invoice_id | some_data
-----+------------+-------------
  1  | 1          | $2.00
  2  | 1          | $5.00
  3  | 1          | $3.00
  4  | 2          | $1.00
  5  | 2          | $1.00
  6  | 3          | $11.00

Is there a 'best practices' for loading this kind of data? 加载此类数据是否有“最佳实践”？ At the moment, I am loading this file into an intermediary table and processing it with a php script (bad). 目前，我正在将此文件加载到中间表中，并使用php脚本对其进行处理（错误）。 It's quite inefficient. 效率很低。 Is there a better way? 有没有更好的办法？ Should I be loading this into an intermediary table and then using a stored procedure to split up the information? 我应该将其加载到中间表中，然后使用存储过程来拆分信息吗？ Or should I be processing my .csv file directly and splitting this information in some sort of script? 还是我应该直接处理.csv文件并以某种脚本形式拆分此信息？

Answer 1

You can use the COPY command to load the data first into a intermediary table whose structure matches the CSV (for example: COPY intermediary_table FROM '/path/to/csv/charges.csv' DELIMITER ',' CSV;), then select the data into each table. 您可以使用COPY命令先将数据加载到结构与CSV相匹配的中介表中（例如：COPY intermediary_table FROM'/path/to/csv/charges.csv'DELIMITER'，'CSV;），然后选择数据放入每个表。 The first query would be a SELECT DISTINCT invoice_id, invoice_data INTO invoices FROM intermediary_table, the second query SELECT DISTINCT charge_id, invoice_id, charge_data INTO charges FROM intermediary_table. 第一个查询将是SELECT DISTINCT invoice_id，invoice_data INTO来自intermediary_table的发票，第二个查询是SELECT DISTINCT charge_id，invoice_id，charge_data INTO来自intermediary_table的费用。

BTW, you most likely do not need to use bigint (unless you expect billions of rows). 顺便说一句，您极有可能不需要使用bigint（除非您期望数十亿行）。

将非规范化数据加载到数据库中

问题描述

1 个解决方案

解决方案1
1 已采纳 2013-10-25 18:09:22

将非规范化数据加载到数据库中

问题描述

1 个解决方案

解决方案1 1 已采纳 2013-10-25 18:09:22

解决方案1
1 已采纳 2013-10-25 18:09:22