[英]Loading Denormalized Data Into A Database
I have a database (postgres) with two tables: 我有一个带有两个表的数据库(postgres):
CREATE TABLE invoices (
id bigint,
some_data varchar
)
CREATE TABLE charges (
id bigint,
invoice_id bigint,
some_data varchar
)
I'm trying to load a csv file with the following format into this database: 我正在尝试将具有以下格式的csv文件加载到该数据库中:
invoice_id, invoice_data, charge_id, charge_data
For example, I could have the following lines in my csv file: 例如,我的csv文件中可能包含以下几行:
1, $10.00, 1, $2.00
1, $10.00, 2, $5.00
1, $10.00, 3, $3.00
2, $2.00, 4, $1.00
2, $2.00, 5, $1.00
3, $11.00, 6, $11.00
This data should correspond to the following records in the database: 此数据应对应于数据库中的以下记录:
SELECT * FROM invoices;
id | some_data
-----+-------------
1 | $10.00
2 | $2.00
3 | $11.00
SELECT * FROM charges;
id | invoice_id | some_data
-----+------------+-------------
1 | 1 | $2.00
2 | 1 | $5.00
3 | 1 | $3.00
4 | 2 | $1.00
5 | 2 | $1.00
6 | 3 | $11.00
Is there a 'best practices' for loading this kind of data? 加载此类数据是否有“最佳实践”? At the moment, I am loading this file into an intermediary table and processing it with a php script (bad).
目前,我正在将此文件加载到中间表中,并使用php脚本对其进行处理(错误)。 It's quite inefficient.
效率很低。 Is there a better way?
有没有更好的办法? Should I be loading this into an intermediary table and then using a stored procedure to split up the information?
我应该将其加载到中间表中,然后使用存储过程来拆分信息吗? Or should I be processing my .csv file directly and splitting this information in some sort of script?
还是我应该直接处理.csv文件并以某种脚本形式拆分此信息?
You can use the COPY command to load the data first into a intermediary table whose structure matches the CSV (for example: COPY intermediary_table FROM '/path/to/csv/charges.csv' DELIMITER ',' CSV;), then select the data into each table. 您可以使用COPY命令先将数据加载到结构与CSV相匹配的中介表中(例如:COPY intermediary_table FROM'/path/to/csv/charges.csv'DELIMITER','CSV;),然后选择数据放入每个表。 The first query would be a SELECT DISTINCT invoice_id, invoice_data INTO invoices FROM intermediary_table, the second query SELECT DISTINCT charge_id, invoice_id, charge_data INTO charges FROM intermediary_table.
第一个查询将是SELECT DISTINCT invoice_id,invoice_data INTO来自intermediary_table的发票,第二个查询是SELECT DISTINCT charge_id,invoice_id,charge_data INTO来自intermediary_table的费用。
BTW, you most likely do not need to use bigint (unless you expect billions of rows). 顺便说一句,您极有可能不需要使用bigint(除非您期望数十亿行)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.