简体   繁体   English

在SAS中使用proc assoc时内存不足

[英]insufficient memory when using proc assoc in SAS

I'm trying to run the following and I receive an error saying that ERROR: The SAS System stopped processing this step because of insufficient memory. 我正在尝试运行以下命令,但收到一条错误消息:ERROR:SAS系统由于内存不足而停止处理此步骤。

The dataset has about 1170(row)*90(column) records. 数据集约有1170(行)* 90(列)条记录。 What are my alternatives here? 我在这里有什么选择?

The error infor. 错误信息。 is below: 在下面:

332  proc assoc data=want1 dmdbcat=dbcat pctsup=0.5 out=frequentItems;
333  id tid;
334  target item_new;
335  run;


----- Potential 1 item sets = 188 -----
Counting items, records read:    19082
Number of customers:               203
Support level for item sets:         1
Maximum count for a set:           136
Sets meeting support level:        188
Megs of memory used:              0.51

----- Potential 2 item sets = 17578 -----
Counting items, records read:    19082
Maximum count for a set:           119
Sets meeting support level:      17484
Megs of memory used:              1.54

----- Potential 3 item sets = 1072352 -----
Counting items, records read:    19082
Maximum count for a set:           111
Sets meeting support level:    1072016
Megs of memory used:             70.14
Error: Out of memory.  Memory used=2111.5 meg.

Item Set 4 is null.
ERROR: The SAS System stopped processing this step because of insufficient memory.
WARNING: The data set WORK.FREQUENTITEMS may be incomplete.  When this step was stopped there were
         1089689 observations and 8 variables.

From the documentation ( http://support.sas.com/documentation/onlinedoc/miner/em43/assoc.pdf ): 从文档( http://support.sas.com/documentation/onlinedoc/miner/em43/assoc.pdf ):

Caution: The theoretical potential number of item sets can grow very quickly. 注意:项目集的理论潜在数量会非常快地增长。 For example, with 50 different items, you have 1225 potential 2-item sets and 19,600 3-item sets. 例如,对于50个不同的项目,您有1225个潜在的2个项目组和19,600个3个项目。 With 5,000 items, you have over 12 million of the 2-item sets, and a correspondingly large number of 3-item sets. 拥有5,000个项目,您拥有超过1200万个2个项目集,以及相应大量的3个项目集。

Processing an extremely large number of sets could cause your system to run out of disk and/or memory resources. 处理大量集合可能会导致系统用尽磁盘和/或内存资源。 However, by using a higher support level, you can reduce the item sets to a more manageable number. 但是,通过使用更高的支持级别,您可以将项目集减少到更易于管理的数量。

So - provide a support= option make sure it's sufficiently high, eg: 所以-提供一个support=选项,确保它足够高,例如:

proc assoc data=want1 dmdbcat=dbcat pctsup=0.5 out=frequentItems support=20;
  id tid;
  target item_new;
run;

Is there a way to frame the data mining task so that it requires less memory for storage or operations? 是否有一种框架数据挖掘任务的方法,以便它需要较少的内存来进行存储或操作? In other words, do you need all 90 columns or can you eliminate some? 换句话说,您是否需要全部90列还是可以省掉一些? Is there some clear division within the data set such that PROC ASSOC wouldn't be expected to use those rows for its findings? 数据集中是否存在明确的划分,以至于不会期望PROC ASSOC将这些行用于其发现?

You may very well be up against software memory allocation limits here. 您很可能会在此处违反软件内存分配限制。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM