繁体   English   中英

SAS Proc 导入 CSV 和丢失的数据

[英]SAS Proc Import CSV and missing data

所以,我试图在 SAS 中导入一些数据集并加入它们,唯一的问题是加入它们后我收到了这个错误 -

    proc import datafile='filepath/datasetA.csv'
    out = dataA
    dbms= csv
    replace;
    run;


    proc import datafile='filepath\datasetB.csv'
    out = dataB
    dbms= csv
    replace;
    run;



    /* combine them all into one dataset*/


    data DataC;
    set &dataA. &dataB;

    run;



    ERROR: Variable column_k has been defined as both character and numeric

在我尝试加入的两个数据集中,有问题的列看起来像这样 -

+----------+
| column_k |
+----------+
| 0        |
| 1        |
| 5        |
| 4        |
| NA       |
| NA       |
| 4        |
| 3        |
| NA       |
+----------+

基本上,如果可能的话,我想将该列中的 NA 数据导入为“缺失”? 我需要整个列保持数字,因为我计划对该列中的数据进行一些数学运算。

谢谢你的帮助!

如果您希望继续使用Proc IMPORT那么您需要确保列的类型相同。 在您的情况下,您知道column_k应该是数字,因此DATA步骤可以使用INPUT函数将字符值转换为数字。

proc import … out = dataA;
proc import … out = dataB;

data dataA;
  set dataA;
  _num = input(column_k, best12.);
  drop column_k;
  rename _num = column_k;
run;

data dataB;
  set dataB;
  _num = input(column_k, best12.);
  drop column_k;
  rename _num = column_k;
run;

data want;
  set dataA dataB;
run;

在更大范围内,列名的数据类型不匹配可能发生在处理多年导入等场景中。

假设不能重新读取旧数据并且新数据具有不同的列类型。

对于需要数值的情况,一种方法是使用宏编写源代码,必要时将指定的变量从字符转换为数字。

例子:

%enforce_num (perm.loans2015, age amount remaining, out=work.loans2015)
%enforce_num (perm.loans2016, age amount remaining, out=work.loans2016)
%enforce_num (perm.loans2017, age amount remaining, out=work.loans2017)

data loans_3yrs; 
  set work.loans2015-loans2017;
run;

回到你更简单的案例:

proc import … out = dataA;
proc import … out = dataB;

%enforce_num(dataA, column_k)
%enforce_num(dataB, column_k)

data want;
  set dataA dataB;
run;

enforce_num会是什么样子? 它必须:

  • 扫描输入数据集元数据
  • 确定一个变量是否是指定的变量之一并且是字符类型
    • 编写源代码将变量转换为数字
    • 保持原来的变量顺序
%macro enforce_num(data, vars, out=&data);

  /*
   * Arguments:
   *   data - name of input data set
   *   vars - space separated list of variables that must be numeric, convert type if necessary
   *   out  - name of output data set, default same as input data set
   *
   * Output:
   *   - Unchanged data set if data and out are the same and no conversion needed
   *   - Changed data set if some columns in data need conversion to numeric
   *     - replaces data if out is same as data
   *     - replaces out if out is different then data
   *     - the column order of the changed data set will be the same as the original data set
   */

  %local dsid index index2 vars varname vartype varnames debug;

  %let index2 = 0;  %* number of variables determined to be requiring conversion;
  %let debug = 0;

  %if &debug %then %put NOTE: &SYSMACRONAME: data=%superq(data);

  %let dsid = %sysfunc(open(&data));
  %if &dsid %then %do;
    %do index = 1 %to %sysfunc(attrn(&dsid, nvars));
      %let varname = %sysfunc(varname(&dsid, &index));

      %let varnames = &varnames &varname;

      %if %sysfunc(indexw(&varname, &vars)) %then %do;
        %if C = %sysfunc(vartype(&dsid, &index)) %then %do;
          %* Data contains character variable requiring enforcement;
          %let index2 = %eval(&index2+1);
          %local convert&index2;
          %let convert&index2 = &varname;

          %let varnames = &varnames ___&index2 ;   %* Variables that will be converted will be named __<#> during conversion;
        %end;
      %end;
    %end;
    %let dsid = %sysfunc(close(&dsid));
  %end;
  %else
    %put %sysfunc(sysmsg());

  %*put NOTE: &=vars;
  %*put NOTE: &=varnames;

  %if &index2 = 0 %then %do;
    %* No columns need to be converted to numeric, copy to out if necessary;
    %if &data ne &out %then %do;
      data &out;
        set &data;
      run;
    %end;
    %return;
  %end;

  %* Some columns need to be converted to numeric;
  %* Ensure the converted column is at the same position (varnum) as in the original data set;

  data &out;
    retain &varnames;

    set &data;

    %do index = 1 %to &index2;
      ___&index = input(&&convert&index,?? best12.);
    %end;

    drop
      %do index = 1 %to &index2;
        &&convert&index
      %end;
    ;

    rename
      %do index = 1 %to &index2;
        ___&index = &&convert&index
      %end;
    ;
  run;

  %put NOTE: ------------------------------------------------;
  %put NOTE: &data has been subjected to numeric enforcement.;
  %put NOTE: ------------------------------------------------;
%mend enforce_num;

proc import是一个猜测过程,通过检查几行数据来工作。这是一个问题,因为 Excel 数据单元格没有任何数据类型。 一列可以在不同的单元格中包含文本、日期、日期时间和数值。

因此,最好使用具有指定变量类型的infile语句:

filename input 'filepath/datasetA.csv';

data dataA;
   infile input truncover firstobs=2/*reads from the second line*/;
   input column_k;/*here you should specify input variables. If you want to read column_k as character, use : "input column_k $100." with specified length*/
run;

filename input clear;

输入(csv文件):

+----------+
| column_k |
+----------+
| 0        |
| 1        |
| 5        |
| 4        |
| NA       |
| NA       |
| 4        |
| 3        |
| NA       |
+----------+

输出(作为数据集dataA):

+----------+
| column_k |
+----------+
|        0 |
|        1 |
|        5 |
|        4 |
|        . |
|        . |
|        4 |
|        3 |
|        . |
+----------+

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM