[英]How do I open a .dat file in R using SAS code?
I have a dataset that I am trying to read into R, but it is in .dat format.我有一个要读入 R 的数据集,但它是 .dat 格式。 I have been given code for reading the dataset into SAS, but not for reading it into R. I am having trouble translating this into something I can use to get the data into a usable state.
我已经获得了将数据集读入 SAS 的代码,但没有将其读入 R。我无法将其转换为可用于将数据转换为可用状态的内容。 Does anyone have any advice?
有人有建议吗? Here is the SAS code:
这是SAS代码:
/* This program is to read in the SPARCS Diagnosis data table. */
OPTIONS NOCENTER NODATE FORMDLIM=' ' compress=yes pagesize=50;
/*USER INPUT NEEDED*/
%let file=".\SPARCS_Extract\SPARCS_DIAG.dat"; *Set to your path;
data SPARCS_DIAG ;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile &file. delimiter = '|' MISSOVER DSD lrecl=32767 firstobs=2 /*obs = 1000*/;
informat clm_trans_id $12. ;
informat disch_yr $4. ;
informat dx_type_cd $2. ;
informat seq_id 8. ;
informat clm_type_cd $1. ;
informat upide $128. ;
informat dx_catgy_cd $2. ;
informat dx_grp_cd $3. ;
informat dx_cd $7. ;
informat poa_ind $1. ;
informat DX_VERS_TYPE_CD $5. ;
informat clm_key $12. ;
informat actv_flag $1. ;
informat ltst_flag $1. ;
informat processed_dt $8. ;
informat created_by $20. ;
informat last_updd_dt $8. ;
informat last_updd_by $20. ;
informat src_nm $30. ;
informat insert_row_dt $8. ;
informat abort_ind $1. ;
informat hiv_ind $1. ;
format clm_trans_id $12. ;
format disch_yr $4. ;
format dx_type_cd $2. ;
format seq_id 8. ;
format clm_type_cd $1. ;
format upide $128. ;
format dx_catgy_cd $2. ;
format dx_grp_cd $3. ;
format dx_cd $7. ;
format poa_ind $1. ;
format DX_VERS_TYPE_CD $5. ;
format clm_key $12. ;
format actv_flag $1. ;
format ltst_flag $1. ;
format processed_dt $8. ;
format created_by $20. ;
format last_updd_dt $8. ;
format last_updd_by $20. ;
format src_nm $30. ;
format insert_row_dt $8. ;
format abort_ind $1. ;
format hiv_ind $1. ;
input
clm_trans_id $
disch_yr $
dx_type_cd $
seq_id
clm_type_cd $
upide $
dx_catgy_cd $
dx_grp_cd $
dx_cd $
poa_ind $
DX_VERS_TYPE_CD $
clm_key $
actv_flag $
ltst_flag $
processed_dt $
created_by $
last_updd_dt $
last_updd_by $
src_nm $
insert_row_dt $
abort_ind $
hiv_ind $
;
if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
run;
The analogous import version of R to read the .dat file can be the base method, read.table
where read.csv
for comma-separated values and read.delim
for tab-separated values are wrappers to it.用于读取 .dat 文件的 R 的类似导入版本可以是基本方法
read.table
,其中read.csv
用于逗号分隔值, read.delim
用于制表符分隔值是它的包装器。
Additionally, the SAS code specifies the data types of every column (where $
translates as character
and remaining being numeric
or integer
) with lengths.此外,SAS 代码指定每列的数据类型(其中
$
转换为character
,其余为numeric
或integer
)和长度。 Therefore, use the colClasses
argument which can run faster since this avoids R inferring types when parsing.因此,使用
colClasses
参数可以更快地运行,因为这样可以避免 R 在解析时推断类型。
Do note: R does not require lengths of strings or numbers and R is case sensitive (ie, DX_VERS_TYPE_CD
!= dx_vers_type_cd
)请注意:R 不需要字符串或数字的长度,并且 R 区分大小写(即
DX_VERS_TYPE_CD
!= dx_vers_type_cd
)
SPARCS_DIALOG <- read.table(
"SPARCS_DIAG.dat",
sep = "|",
colClasses = c(
"clm_trans_id" = "character",
"disch_yr" = "character",
"dx_type_cd" = "character",
"seq_id" = "integer",
"clm_type_cd" = "character",
"upide" = "character",
"dx_catgy_cd" = "character",
"dx_grp_cd" = "character",
"dx_cd" = "character",
"poa_ind" = "character",
"DX_VERS_TYPE_CD" = "character",
"clm_key" = "character",
"actv_flag" = "character",
"ltst_flag" = "character",
"processed_dt" = "character",
"created_by" = "character",
"last_updd_dt" = "character",
"last_updd_by" = "character",
"src_nm" = "character",
"insert_row_dt" = "character",
"abort_ind" = "character",
"hiv_ind" = "character"
)
)
However, seeing your comment that you did attempt read.table
(possibly without colClasses
), the wrappers have some arguments that may help such as quote = "\""
and fill=TRUE
. Therefore, consider using those methods but change sep
argument:但是,看到您确实尝试
read.table
的评论(可能没有colClasses
),包装器有一些可能有帮助的参数,例如quote = "\""
和fill=TRUE
。因此,请考虑使用这些方法但更改sep
参数:
SPARCS_DIALOG <- read.csv(
"SPARCS_DIAG.dat",
sep = "|",
colClasses = c(
"clm_trans_id" = "character",
"disch_yr" = "character",
"dx_type_cd" = "character",
... # REST OF COLUMNS
)
)
SPARCS_DIALOG <- read.delim(
"SPARCS_DIAG.dat",
sep = "|",
colClasses = c(
"clm_trans_id" = "character",
"disch_yr" = "character",
"dx_type_cd" = "character",
... # REST OF COLUMNS
)
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.