简体   繁体   English

数据步骤中的宏变量

[英]Macro variables inside data step

I need to save the value of a certain variable in a data step in a macro and then use that macro in the same data step. 我需要将某个变量的值保存在宏的数据步骤中,然后在同一数据步骤中使用该宏。 i tried with SYMPUT , but if I've understood correctly a macro variable created that way cannot be used inside the same data step (it's assigned at the end of the data step, I guess?) 我尝试使用SYMPUT ,但是如果我正确理解了以这种方式创建的宏变量,则不能在同一数据步骤中使用(我猜是在数据步骤的末尾分配的)?

Here is a simplified example. 这是一个简化的示例。 I have a list of data fields t1,...,t100 representing something happening at certain times, events being represented by numbers, and a data field t_start giving me the starting time for the process in which I'm interested for every observation. 我有一个数据字段t1,...,t100的列表t1,...,t100表示在某些时间发生的事情,事件由数字表示,而数据字段t_start为我提供了我对每次观察都感兴趣的过程的开始时间。 I want to check if I have all the data, and drop the observation otherwise. 我想检查是否拥有所有数据,否则将其删除。 I want to proceed as follows. 我要进行如下操作。

DATA WANT;
    SET HAVE;

    CALL SYMPUT('START_TIME', t_start);

    DO I=&START_TIME. TO 100;
        IF t_&I. = . THEN DELETE;
    END;
RUN;

This doesn't work, I think for the reasons mentioned above. 我认为这是行不通的,出于上述原因。 Is there a workaround? 有解决方法吗?

Remarks: 备注:

  1. I simplified the situation, the real case I'm looking at is more complex (eg the variables are not called t1,...t100 but something with a bit more structure). 我简化了这种情况,我正在研究的实际情况更加复杂(例如,变量不称为t1,...t100而是结构更复杂的东西)。 If possible, I would like to get something as close to the approach I gave above as possible, as different solutions might not be applicable to my case. 如果可能的话,我想获得与上述方法尽可能接近的方法,因为不同的解决方案可能不适用于我的情况。 Of course, if this is not possible then any solution is more than welcome! 当然,如果这不可能,那么任何解决方案都将受到欢迎!
  2. I tried looking at RESOLVE , but it doesn't seem to do what I'm looking for (or at least I don't understand it well enough to make it do what i desire). 我尝试查看RESOLVE ,但是它似乎并没有满足我的要求(或者至少我不太了解它,无法使其满足我的期望)。
  3. As a last resort, I could try to solve the problem using two data steps, one defining the macro variables, the other one using it to check and delete the undesired observations. 作为最后的选择,我可以尝试使用两个数据步骤来解决问题,一个步骤定义宏变量,另一个步骤使用它来检查和删除不需要的观察值。 I would prefer avoiding this solution if possible. 如果可能的话,我宁愿避免这种解决方案。

Update: I solved the problem using arrays, as suggested in the solutions. 更新:我按照解决方案中的建议使用数组解决了这个问题。

You are trying to use the value of the dataset variables t_start and i to figure out which variable to test. 您正在尝试使用dataset变量t_starti来确定要测试的变量。 That is what arrays are for. 那就是数组的目的。

DATA WANT;
  SET HAVE;
  array t t1-t100;
  DO I=t_start TO 100;
    IF t(i) = . THEN DELETE;
  END;
RUN;

No need for macro variables, much less macro variables that can travel backwards in time and modify the code of a data step after it has already started running. 不需要宏变量,更不用说可以在时间上向后移动并在数据步骤开始运行后修改其代码的宏变量。

This may be a better approach using arrays. 这可能是使用数组的更好方法。 Given what you've posted, this works. 鉴于您发布的内容,此方法可行。 If it doesn't match what you need, please post more details about your situation. 如果与您的需求不符,请发布有关您情况的更多详细信息。

data demo;
array t(10);
do row=1 to 100;
do i=1 to 10;
t(i)=rand('integer', 1, 5);
end;
start = rand('integer', 1, 10);
output;
end;
run;

data test;
set demo;

array t(10);



do i=start to dim(t);
if t(i) < 2 then do;
    delete;
    leave;*exits loop;
end;
end;

run;

When a data step is running the step has been 'compiled' and all ampersand ( & ) macro variable resolutions have already been resolved. 当数据步骤正在运行时,该步骤已被“编译”,并且所有&号( & )宏变量解析都已得到解决。 A running compiled step can not alter its source code. 正在运行的已编译步骤无法更改其源代码。

If you submitted your code twice, the first time would log a WARNING: Apparent symbolic reference not resolved. 如果您两次提交代码,则第一次会记录WARNING: Apparent symbolic reference not resolved. The second time would not have the warning, and would be using the value populated from the prior submission. 第二次没有警告,将使用先前提交的值填充。

Suppose your data record has many variables, and two sentinel variables whose values are the names of the variables that mark the start and stop of some processing to occur. 假设您的数据记录有很多变量,还有两个前哨变量,其值是标记某些处理开始和结束的变量的名称 While an unwieldy data construct, you can use an array to mediate access to the variant set of variables to be processed. 在使用笨拙的数据构造时,可以使用数组来调解对要处理的变量集的访问。

For example: 例如:

data have;
input 
 a  b  c  d  e  f  g  h start $ stop $ ; datalines;
 1  2  3  4  5  6  7  8  d  e
11 12 13 14 15 16 17 18  a  b
 0  1  1  2  3  5  8 11  c  h
 1  1  1  1  1  .  1  1  a  e  wont be deleted because . is at f
 1  2  3  4  .  6  7  8  a  h
run;

data want;
  set have;
  array num a--h;
  do i = 1 to dim(num);
    if vname(num(i)) = start then startindex=i;
    if vname(num(i)) = stop  then stopindex=i;
  end;

  do i = min(startindex,stopindex) to max(startindex,stopindex);
    if missing(num(i)) then delete;
  end;
run;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM