[英]SAS Proc Import CSV and missing data
So, I'm trying to import some datasets in SAS and join them, the only problem is that I get this error after joining them -所以,我试图在 SAS 中导入一些数据集并加入它们,唯一的问题是加入它们后我收到了这个错误 -
proc import datafile='filepath/datasetA.csv'
out = dataA
dbms= csv
replace;
run;
proc import datafile='filepath\datasetB.csv'
out = dataB
dbms= csv
replace;
run;
/* combine them all into one dataset*/
data DataC;
set &dataA. &dataB;
run;
ERROR: Variable column_k has been defined as both character and numeric
The column in question looks something like this in both of the data sets that I'm trying to join -在我尝试加入的两个数据集中,有问题的列看起来像这样 -
+----------+
| column_k |
+----------+
| 0 |
| 1 |
| 5 |
| 4 |
| NA |
| NA |
| 4 |
| 3 |
| NA |
+----------+
Basically, I would like to import the NA data in that column as 'missing', if that's possible?基本上,如果可能的话,我想将该列中的 NA 数据导入为“缺失”? I need the entire column to remain numeric as I'm planning on doing some mathematical stuff with the data in that column further down the line.
我需要整个列保持数字,因为我计划对该列中的数据进行一些数学运算。
Thanks for your help!谢谢你的帮助!
If you wish to continue using Proc IMPORT
then you will need to ensure the columns are like-typed.如果您希望继续使用
Proc IMPORT
那么您需要确保列的类型相同。 In your case you know column_k
should be numeric, so a DATA
step can convert the character values to numeric using the INPUT
function.在您的情况下,您知道
column_k
应该是数字,因此DATA
步骤可以使用INPUT
函数将字符值转换为数字。
proc import … out = dataA;
proc import … out = dataB;
data dataA;
set dataA;
_num = input(column_k, best12.);
drop column_k;
rename _num = column_k;
run;
data dataB;
set dataB;
_num = input(column_k, best12.);
drop column_k;
rename _num = column_k;
run;
data want;
set dataA dataB;
run;
In a larger scope mismatched data types for a column name can occur in a scenario such as dealing with multi-year imports.在更大范围内,列名的数据类型不匹配可能发生在处理多年导入等场景中。
Suppose the older data can't be re-read and the newer data has different column type.假设不能重新读取旧数据并且新数据具有不同的列类型。
For the case of wanting numeric values, one approach is to have macro that writes source code that converts, if necessary, specified variables from character to numeric.对于需要数值的情况,一种方法是使用宏编写源代码,必要时将指定的变量从字符转换为数字。
Example:例子:
%enforce_num (perm.loans2015, age amount remaining, out=work.loans2015)
%enforce_num (perm.loans2016, age amount remaining, out=work.loans2016)
%enforce_num (perm.loans2017, age amount remaining, out=work.loans2017)
data loans_3yrs;
set work.loans2015-loans2017;
run;
Going back to your simpler case:回到你更简单的案例:
proc import … out = dataA;
proc import … out = dataB;
%enforce_num(dataA, column_k)
%enforce_num(dataB, column_k)
data want;
set dataA dataB;
run;
What would the macro enforce_num
look like?宏
enforce_num
会是什么样子? It would have to:它必须:
%macro enforce_num(data, vars, out=&data);
/*
* Arguments:
* data - name of input data set
* vars - space separated list of variables that must be numeric, convert type if necessary
* out - name of output data set, default same as input data set
*
* Output:
* - Unchanged data set if data and out are the same and no conversion needed
* - Changed data set if some columns in data need conversion to numeric
* - replaces data if out is same as data
* - replaces out if out is different then data
* - the column order of the changed data set will be the same as the original data set
*/
%local dsid index index2 vars varname vartype varnames debug;
%let index2 = 0; %* number of variables determined to be requiring conversion;
%let debug = 0;
%if &debug %then %put NOTE: &SYSMACRONAME: data=%superq(data);
%let dsid = %sysfunc(open(&data));
%if &dsid %then %do;
%do index = 1 %to %sysfunc(attrn(&dsid, nvars));
%let varname = %sysfunc(varname(&dsid, &index));
%let varnames = &varnames &varname;
%if %sysfunc(indexw(&varname, &vars)) %then %do;
%if C = %sysfunc(vartype(&dsid, &index)) %then %do;
%* Data contains character variable requiring enforcement;
%let index2 = %eval(&index2+1);
%local convert&index2;
%let convert&index2 = &varname;
%let varnames = &varnames ___&index2 ; %* Variables that will be converted will be named __<#> during conversion;
%end;
%end;
%end;
%let dsid = %sysfunc(close(&dsid));
%end;
%else
%put %sysfunc(sysmsg());
%*put NOTE: &=vars;
%*put NOTE: &=varnames;
%if &index2 = 0 %then %do;
%* No columns need to be converted to numeric, copy to out if necessary;
%if &data ne &out %then %do;
data &out;
set &data;
run;
%end;
%return;
%end;
%* Some columns need to be converted to numeric;
%* Ensure the converted column is at the same position (varnum) as in the original data set;
data &out;
retain &varnames;
set &data;
%do index = 1 %to &index2;
___&index = input(&&convert&index,?? best12.);
%end;
drop
%do index = 1 %to &index2;
&&convert&index
%end;
;
rename
%do index = 1 %to &index2;
___&index = &&convert&index
%end;
;
run;
%put NOTE: ------------------------------------------------;
%put NOTE: &data has been subjected to numeric enforcement.;
%put NOTE: ------------------------------------------------;
%mend enforce_num;
proc import
is a guessing procedure and works by examining a few rows of data.This is a problem because Excel data cells have no data type whatsoever. proc import
是一个猜测过程,通过检查几行数据来工作。这是一个问题,因为 Excel 数据单元格没有任何数据类型。 A column can have text, date, datetime and numeric values in different cells.一列可以在不同的单元格中包含文本、日期、日期时间和数值。
So, better to use infile
statement with specified variable types:因此,最好使用具有指定变量类型的
infile
语句:
filename input 'filepath/datasetA.csv';
data dataA;
infile input truncover firstobs=2/*reads from the second line*/;
input column_k;/*here you should specify input variables. If you want to read column_k as character, use : "input column_k $100." with specified length*/
run;
filename input clear;
Input(csv file):输入(csv文件):
+----------+
| column_k |
+----------+
| 0 |
| 1 |
| 5 |
| 4 |
| NA |
| NA |
| 4 |
| 3 |
| NA |
+----------+
Output (sas dataset dataA):输出(作为数据集dataA):
+----------+
| column_k |
+----------+
| 0 |
| 1 |
| 5 |
| 4 |
| . |
| . |
| 4 |
| 3 |
| . |
+----------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.