简体繁体中英

Slow loading into ultra-wide tables on Redshift

原文 2017-01-11 23:26:26 8 2 database/ amazon-redshift

I have a few ultra-wide tables (1500+ columns) which I am trying to load data into. I am loading GZIPped files from S3 using a manifest file. The distkey of the table is 'date' and each file in S3 contains information for one particular date only. The columns are mostly float s, with a few date s and varchar s.

Each file has approximately 16000 rows with 1500 columns, and is approximately 84 MiB gzipped. Even following best practices for loading, we are seeing very poor load performance: 100 records/s or approximately 300 kB/s.

Are there any suggestions for improving load speeds specifically for ultra-wide tables? I'm loading data into narrower tables using similar techniques with fairly reasonable speeds, so I have reason to believe that this is an artifact of the width of the table.

2 answers

Having files separated by the DISTKEY field does not necessarily improve load speed . Amazon Redshift will use multiple nodes to import files in parallel. The node that reads one particular input file will not necessarily be the same node used to store the data. Therefore, the data will be sent between nodes (which is expected during a load process).

If the table has been newly created, then the load process will automatically use the first 100,000 rows to determine an optimal compression type for each column . It will then delete that data and restart the load process. To avoid this, either create the table with compression defined on each column or run the COPY command with the COMPUPDATE option set to OFF. If, on the other hand, there is already data in the table, then this automatic process will be skipped.

It is possible that the load process is consuming too much memory and is spilling to disk. Try increasing wlm_query_slot_count increase the memory available to the COPY command. However, I'm not sure that this parameter applies to COPY commands (it is for 'queries', and the COPY command might not qualify as a query).

Adding for future reference: One optimization that helped was switching from Gzipped JSON to CSV files. This reduced each file from 84 MiB to 11 MiB and tripled the loading speed.

list tables in redshift using SQLWorkbench

Winsock closesocket() performance (local computer, 127.0.0.1): Why is it so slow on some computers and ultra fast with others?

AWS Redshift: How often are STL tables purged

Loading column selectively from TSV in Redshift

Copy tables between two environments - Redshift

SQLite: COUNT slow on big tables

slow query with indexed tables Mysql

Why is data loading to voltdb slow?

Does a redshift materialized view refresh lock the base tables?

Joining and Aggregating a Large Number of Fact Tables Efficiently in Redshift

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question list tables in redshift using SQLWorkbench Winsock closesocket() performance (local computer, 127.0.0.1): Why is it so slow on some computers and ultra fast with others? AWS Redshift: How often are STL tables purged Loading column selectively from TSV in Redshift Copy tables between two environments - Redshift SQLite: COUNT slow on big tables slow query with indexed tables Mysql Why is data loading to voltdb slow? Does a redshift materialized view refresh lock the base tables? Joining and Aggregating a Large Number of Fact Tables Efficiently in Redshift

Related Tags

Slow loading into ultra-wide tables on Redshift

Question

2 answers

solution1
0 2017-01-12 21:16:39

solution2
0 2017-01-13 03:51:03

Slow loading into ultra-wide tables on Redshift

Question

2 answers

solution1 0 2017-01-12 21:16:39

solution2 0 2017-01-13 03:51:03

solution1
0 2017-01-12 21:16:39

solution2
0 2017-01-13 03:51:03