简体   繁体   English

SAS企业指南/ SQL性能

[英]SAS Enterprise Guide / SQL Performance

I'm looking for a little guidance on a SAS/SQL performance issue I'm having. 我正在寻找有关SAS / SQL性能问题的一些指导。 In SAS Enterprise Guide, I've created a program that creates a table. 在《 SAS企业指南》中,我创建了一个创建表的程序。 This table has about 90k rows: 该表约有9万行:

CREATE TABLE test AS (
  SELECT id, SUM(myField)
  FROM table1
  GROUP BY id
)

I have a much larger table with millions of rows. 我有一个更大的表,有数百万行。 Each row has an id. 每行都有一个ID。 I want to sum values on this table, using only id's present in the 'test' table. 我想仅使用“测试”表中存在的ID对表中的值求和。 I tried this: 我尝试了这个:

CREATE TABLE test2 AS(
  SELECT big.id, SUM(big.myOtherField)
  FROM big
  INNER JOIN test
    ON test.id = big.id
  GROUP BY big.id
)

The problem I'm having is that it takes forever to run the second query against the big table with millions of records. 我遇到的问题是,要对具有数百万条记录的大表运行第二个查询要花很多时间。 I thought the inner join on the subset of id's would help (and maybe it is) but I wanted to make sure I was doing everything I could to speed it up. 我以为id的子集上的内部联接会有所帮助(也许是),但是我想确保自己正在尽一切努力来加快它的速度。

I don't have any way to get information on the indexing of the underlying database. 我没有任何方法可以获取有关基础数据库索引的信息。 I'm more interested in getting the opinion of someone who has more SQL and SAS experience than me. 我对吸引比我有更多SQL和SAS经验的人更感兴趣。

From what you show in your question, you are joining two SAS data sets, not two database objects. 根据问题所显示的内容,您正在联接两个SAS数据集,而不是两个数据库对象。 In any case, you can speed up the processing by defining indexes on the JOIN columns used in each table. 无论如何,您可以通过在每个表中使用的JOIN列上定义索引来加快处理速度。 Assuming you have permission to do so, here are examples: 假设您有权这样做,请参考以下示例:

proc sql;
   create index id on big(id);
   create index id on test(id);
quit;

Of course, you probably should first check the table definition before doing that. 当然,您可能应该先检查表定义再执行此操作。 You can use the "describe" statement to see the structure: 您可以使用“ describe”语句查看结构:

proc sql;
   describe table big;
quit;

Indexes improve access performance at the cost of disk space and update maintenance. 索引以磁盘空间为代价提高访问性能并更新维护。 Once created, the indexes will be a permanent part of the SAS data set and will be automatically updated if you use SQL INSERT or DELETE statements. 创建索引后,索引将成为SAS数据集的永久部分,并且如果使用SQL INSERT或DELETE语句,索引将自动更新。 But be aware that the indexes will be deleted if you recreate the data set with a simple data step. 但是请注意,如果使用简单的数据步骤重新创建数据集,则索引将被删除。

On the other hand, if these tables really are in an external database (like Oracle for example), you have a different challenge. 另一方面,如果这些表确实在外部数据库(例如Oracle)中,则您将面临另一个挑战。 If that's the case, I'd ask a new question and provide a complete example of the SAS code you are using (including and libname statements). 如果是这样,我想问一个新问题,并提供您正在使用的SAS代码的完整示例(包括and libname语句)。

If you are working with non-SAS data, ie, data that resides in a SQL DB or a no-SQL database for that matter, you will see significant improvements in performance using pass-through SQL or, if supported and you have the licenses for it, in-database processing. 如果您使用的是非SAS数据,即驻留在SQL DB或无SQL数据库中的数据,则使用直通SQL或如果获得许可并拥有许可证,则将看到性能的显着提高。为此,数据库内处理。

One important point about proc sql vs pass-through sql. 关于proc sql和pass-through sql的重要一点。 Proc sql, by default, creates duplication of the original source data in SAS datasets prior to doing the work. 默认情况下,Proc sql在执行工作之前会在SAS数据集中创建原始源数据的副本。 Whereas, pass-through just requests the result set from the source data provider. 而直通仅向源数据提供者请求结果集。 In short, you can imagine that a table with 5 million rows will take a lot longer to use with proc sql (even if you are only interested in about 1% of the data) than if you just have to pull that 1% of data across the network using the pass-through mechanism. 简而言之,您可以想象具有500万行的表与proc sql一起使用将花费更长的时间(即使您仅对大约1%的数据感兴趣)比仅需拉取那1%的数据要花费更长的时间使用直通机制跨网络。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何提高SAS Enterprise Guide 4的性能 - How to improve the performance of SAS Enterprise Guide 4 SAS Enterprise Guide proc sql - 使用 when 语句包含在内 - SAS Enterprise Guide proc sql - using when statements inclusively 如何在 SAS 企业指南中仅使用 PROC SQL select char 变量? - How to select only char variables with PROC SQL in SAS Enterprise Guide? SQL Left Join with WHERE 子句 SAS 企业指南 - SQL Left Join with WHERE clause SAS Enterprise Guide 如何使用 SAS 企业指南中 PROC SQL / SAS 代码中的其他 2 列中的值填充列? - How to fill column using values in 2 other columns in PROC SQL / SAS code in SAS Enterprise Guide? 如何在 PROC SQL 中使用 SAS 企业指南中每月值的总和创建新列? - How to create new column in PROC SQL with sum of values per month in SAS enterprise Guide? 如何在 SAS 企业指南中的 PROC SQL 的两列之间创建标志 0/1 通知是否在 4 个月内更改? - How to create flag 0/1 inform whether was changed or not during 4 months between two columns in PROC SQL in SAS Enterprise Guide? 如何在 SAS 企业指南中的 PROC SQL 中创建标志通知 ID 上的两列是否有变化? - How to create flag inform whether was some change in two columns on ID in PROC SQL in SAS Enterprise Guide? 如何计算在 SAS 企业指南中的 PROC SQL 中定义 ID 的某个值出现了多少次? - How to count how many time some value appeard with defined ID in PROC SQL in SAS Enterprise Guide? 如何在 SAS 企业指南中的 PROC SQL 的某些日期中的 2 列之间更改值来创建 0/1 标志? - How to create 0/1 flag witch infrom whether values were changed between 2 column in some dates in PROC SQL in SAS Enterprise Guide?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM