简体   繁体   English

sas teradata fastload问题

[英]sas teradata fastload issue

Is there a fast way to load data to teradata? 有没有快速的方法将数据加载到teradata? I need to load 350,000 account numbers to teradata and it's been running for about 4.5 hours now. 我需要为teradata加载350,000个帐号,现在它已经运行了大约4.5个小时。

I am just using a data step. 我只是使用数据步骤。 Below is my code. 以下是我的代码。 Thank you 谢谢

libname myid  teradata authdomain=IDWPRD server=IDWPRD database=myid mode=teradata connection=global;

proc delete data=myid.tera1;
run;

proc sql; 
create table out.REQ_1_1_05l as 
select distinct ACCOUNT_NB as ACCT_NB
FROM OUT.REQ_1_1_05;
quit;

data myid.tera1;
set OUT.REQ_1_1_05l ;
run;

Use the bulkload=yes option in your libname statement: 在libname语句中使用bulkload=yes选项:

libname myid  teradata authdomain=IDWPRD server=IDWPRD database=myid mode=teradata connection=global bulkload=yes;

data tera.want;
     set have;
run;

Additional performance information specific to Teradata can be found here: http://support.sas.com/documentation/cdl/en/acreldb/63647/HTML/default/viewer.htm#a001405937.htm 有关Teradata的其他性能信息,请访问: http//support.sas.com/documentation/cdl/en/acreldb/63647/HTML/default/viewer.htm#a001405937.htm

This is most often result of a bad practice. 这通常是不良做法的结果。 if 350,000 records takes more then few minutes without even bulk load utility then also it is surprising to me(unless it is very wide table). 如果350,000条记录花费的时间超过几分钟甚至没有批量加载实用程序,那么对我来说也是令人惊讶的(除非它是非常宽的表)。

In Teradata, table rows are distributed on Access Module Processor (AMP). 在Teradata中,表行分布在访问模块处理器(AMP)上。 Row distribution is dependent on uniqueness of defined primary index column. 行分布取决于定义的主索引列的唯一性。 More unique the primary index column is better the data distribution and vice versa. 主索引列越独特,数据分布越好,反之亦然。 Improper distribution of table rows in AMP's will results in skewed data. AMP中表行的不正确分布将导致数据偏斜。

Below query will create Teradata table with first column as primary index. 下面的查询将创建Teradata表,第一列作为主索引。 If first column has few distinct values a skewed table is created. 如果第一列具有很少的不同值,则会创建倾斜表。 As mentioned earlier, the impact of the skewed table results in wastage of space and can take unusually long time for your queries to finish. 如前所述,倾斜表的影响会导致空间浪费,并且可能需要非常长的时间才能完成查询。

  data myid.tera1;
 set OUT.REQ_1_1_05l ;
run;                                                                                                                                            

Data set option dbcreate_table_opts can define primary index explicitly. 数据集选项dbcreate_table_opts可以显式定义主索引。 dbcreate_table_opts = Data Set option needs a key word primary index followed by column name in parenthesis. dbcreate_table_opts =数据集选项需要关键字主索引,后跟括号中的列名。

 data  myid.tera1
    (dbcreate_table_opts= 'primary index(yourcolumn)');  
  set OUT.REQ_1_1_05l; 
 run;

Please select appropriate unique primary index, which is often most important thing in Teradata. 请选择适当的唯一主索引,这通常是Teradata中最重要的事情。

Please look into below paper, which explains what are common issues SAS programmers may have while using Teradata. 请查看下面的文章,其中解释了SAS程序员在使用Teradata时可能遇到的常见问题。

https://www.lexjansen.com/mwsug/2016/SA/MWSUG-2016-SA11.pdf https://www.lexjansen.com/mwsug/2016/SA/MWSUG-2016-SA11.pdf

You can also use fast load utility as shown below. 您还可以使用快速加载实用程序,如下所示。 Fast load does bulkloading and makes it tremendously fast to move data from sas to Teradata. 快速加载可以进行批量加载,并且可以非常快速地将数据从sas移动到Teradata。

    data  myid.tera1
    (fastload =yes dbcreate_table_opts= 'primary index(yourcolumn)');  
  set OUT.REQ_1_1_05l; 
 run;

Look into paper by Jeff bailey if you want know everything about SAS and Teradata data movement. 如果您想了解有关SAS和Teradata数据移动的所有信息,请查看Jeff Bailey撰写的论文。

https://support.sas.com/.../EffectivelyMovingSASDataintoTeradata.pdf https://support.sas.com/.../EffectivelyMovingSASDataintoTeradata.pdf

Finally check whether your table myid.tera1 is set table, which will not allow duplicates but this may not be major factor. 最后检查你的表myid.tera1是否设置了表,这将不允许重复,但这可能不是主要因素。 If you Teradata sql assistant you do show table , it will give you whether it is set or multiset table. 如果您使用Teradata sql助手显示表,它将为您提供是设置还是多集表。 Set table does not allow row level duplicates and checks for every row before insertion and time for loading. 设置表不允许行级重复并检查插入前的每一行和加载时间。

Add the dbcommit= option to your libname statement. dbcommit=选项添加到libname语句中。 The default is 1 record, ie it commits on every record. 默认值为1条记录,即它在每条记录上提交。 Play around with this value to find the optimal setting for your configuration. 使用此值可以找到配置的最佳设置。

libname myid teradata authdomain=IDWPRD server=IDWPRD database=myid mode=teradata connection=global dbcommit=5000 ;

https://support.sas.com/documentation/cdl/en/acreldb/63647/HTML/default/viewer.htm#a001371531.htm https://support.sas.com/documentation/cdl/en/acreldb/63647/HTML/default/viewer.htm#a001371531.htm

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM