简体   繁体   English

将非规范化数据加载到数据库中

[英]Loading Denormalized Data Into A Database

I have a database (postgres) with two tables: 我有一个带有两个表的数据库(postgres):

CREATE TABLE invoices (
    id bigint,
    some_data varchar
)

CREATE TABLE charges (
    id bigint,
    invoice_id bigint,
    some_data varchar
)

I'm trying to load a csv file with the following format into this database: 我正在尝试将具有以下格式的csv文件加载到该数据库中:

invoice_id, invoice_data, charge_id, charge_data

For example, I could have the following lines in my csv file: 例如,我的csv文件中可能包含以下几行:

1, $10.00, 1, $2.00
1, $10.00, 2, $5.00
1, $10.00, 3, $3.00
2, $2.00,  4, $1.00
2, $2.00,  5, $1.00
3, $11.00, 6, $11.00

This data should correspond to the following records in the database: 此数据应对应于数据库中的以下记录:

SELECT * FROM invoices;
  id | some_data
-----+-------------
  1  | $10.00
  2  | $2.00
  3  | $11.00

SELECT * FROM charges;
  id | invoice_id | some_data
-----+------------+-------------
  1  | 1          | $2.00
  2  | 1          | $5.00
  3  | 1          | $3.00
  4  | 2          | $1.00
  5  | 2          | $1.00
  6  | 3          | $11.00

Is there a 'best practices' for loading this kind of data? 加载此类数据是否有“最佳实践”? At the moment, I am loading this file into an intermediary table and processing it with a php script (bad). 目前,我正在将此文件加载到中间表中,并使用php脚本对其进行处理(错误)。 It's quite inefficient. 效率很低。 Is there a better way? 有没有更好的办法? Should I be loading this into an intermediary table and then using a stored procedure to split up the information? 我应该将其加载到中间表中,然后使用存储过程来拆分信息吗? Or should I be processing my .csv file directly and splitting this information in some sort of script? 还是我应该直接处理.csv文件并以某种脚本形式拆分此信息?

You can use the COPY command to load the data first into a intermediary table whose structure matches the CSV (for example: COPY intermediary_table FROM '/path/to/csv/charges.csv' DELIMITER ',' CSV;), then select the data into each table. 您可以使用COPY命令先将数据加载到结构与CSV相匹配的中介表中(例如:COPY intermediary_table FROM'/path/to/csv/charges.csv'DELIMITER','CSV;),然后选择数据放入每个表。 The first query would be a SELECT DISTINCT invoice_id, invoice_data INTO invoices FROM intermediary_table, the second query SELECT DISTINCT charge_id, invoice_id, charge_data INTO charges FROM intermediary_table. 第一个查询将是SELECT DISTINCT invoice_id,invoice_data INTO来自intermediary_table的发票,第二个查询是SELECT DISTINCT charge_id,invoice_id,charge_data INTO来自intermediary_table的费用。

BTW, you most likely do not need to use bigint (unless you expect billions of rows). 顺便说一句,您极有可能不需要使用bigint(除非您期望数十亿行)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM