简体   繁体   English

如何使用 SAS 代码在 R 中打开 .dat 文件?

[英]How do I open a .dat file in R using SAS code?

I have a dataset that I am trying to read into R, but it is in .dat format.我有一个要读入 R 的数据集,但它是 .dat 格式。 I have been given code for reading the dataset into SAS, but not for reading it into R. I am having trouble translating this into something I can use to get the data into a usable state.我已经获得了将数据集读入 SAS 的代码,但没有将其读入 R。我无法将其转换为可用于将数据转换为可用状态的内容。 Does anyone have any advice?有人有建议吗? Here is the SAS code:这是SAS代码:

/* This program is to read in the SPARCS Diagnosis data table. */
OPTIONS NOCENTER NODATE FORMDLIM=' ' compress=yes pagesize=50;

/*USER INPUT NEEDED*/
%let file=".\SPARCS_Extract\SPARCS_DIAG.dat";  *Set to your path;
    data SPARCS_DIAG    ;

%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile &file. delimiter = '|' MISSOVER DSD lrecl=32767 firstobs=2 /*obs = 1000*/;
   informat clm_trans_id $12. ;
   informat disch_yr $4. ;
   informat dx_type_cd $2. ;
   informat seq_id 8. ;
   informat clm_type_cd $1. ;
   informat upide $128. ;
   informat dx_catgy_cd $2. ;
   informat dx_grp_cd $3. ;
   informat dx_cd $7. ;
   informat poa_ind $1. ;
   informat DX_VERS_TYPE_CD $5. ;
   informat clm_key $12. ;
   informat actv_flag $1. ;
   informat ltst_flag $1. ;
   informat processed_dt $8. ;
   informat created_by $20. ;
   informat last_updd_dt $8. ;
   informat last_updd_by $20. ;
   informat src_nm $30. ;
   informat insert_row_dt $8. ;
   informat abort_ind $1. ;
   informat hiv_ind $1. ;
   
   format clm_trans_id $12. ;
   format disch_yr $4. ;
   format dx_type_cd $2. ;
   format seq_id 8. ;
   format clm_type_cd $1. ;
   format upide $128. ;
   format dx_catgy_cd $2. ;
   format dx_grp_cd $3. ;
   format dx_cd $7. ;
   format poa_ind $1. ;
   format DX_VERS_TYPE_CD $5. ;
   format clm_key $12. ;
   format actv_flag $1. ;
   format ltst_flag $1. ;
   format processed_dt $8. ;
   format created_by $20. ;
   format last_updd_dt $8. ;
   format last_updd_by $20. ;
   format src_nm $30. ;
   format insert_row_dt $8. ;
   format abort_ind $1. ;
   format hiv_ind $1. ;

input
   clm_trans_id $
   disch_yr $
   dx_type_cd $
   seq_id 
   clm_type_cd $
   upide $
   dx_catgy_cd $
   dx_grp_cd $
   dx_cd $
   poa_ind $
   DX_VERS_TYPE_CD $
   clm_key $
   actv_flag $
   ltst_flag $
   processed_dt $
   created_by $
   last_updd_dt $
   last_updd_by $
   src_nm $
   insert_row_dt $
   abort_ind $
   hiv_ind $
;

if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */
run;

The analogous import version of R to read the .dat file can be the base method, read.table where read.csv for comma-separated values and read.delim for tab-separated values are wrappers to it.用于读取 .dat 文件的 R 的类似导入版本可以是基本方法read.table ,其中read.csv用于逗号分隔值, read.delim用于制表符分隔值是它的包装器。

Additionally, the SAS code specifies the data types of every column (where $ translates as character and remaining being numeric or integer ) with lengths.此外,SAS 代码指定每列的数据类型(其中$转换为character ,其余为numericinteger )和长度。 Therefore, use the colClasses argument which can run faster since this avoids R inferring types when parsing.因此,使用colClasses参数可以更快地运行,因为这样可以避免 R 在解析时推断类型。

Do note: R does not require lengths of strings or numbers and R is case sensitive (ie, DX_VERS_TYPE_CD != dx_vers_type_cd )请注意:R 不需要字符串或数字的长度,并且 R 区分大小写(即DX_VERS_TYPE_CD != dx_vers_type_cd

SPARCS_DIALOG <- read.table(
    "SPARCS_DIAG.dat",
    sep = "|",
    colClasses = c(
        "clm_trans_id" = "character",
        "disch_yr" = "character",
        "dx_type_cd" = "character",
        "seq_id" = "integer",
        "clm_type_cd" = "character",
        "upide" = "character",
        "dx_catgy_cd" = "character",
        "dx_grp_cd" = "character",
        "dx_cd" = "character",
        "poa_ind" = "character",
        "DX_VERS_TYPE_CD" = "character",
        "clm_key" = "character",
        "actv_flag" = "character",
        "ltst_flag" = "character",
        "processed_dt" = "character",
        "created_by" = "character",
        "last_updd_dt" = "character",
        "last_updd_by" = "character",
        "src_nm" = "character",
        "insert_row_dt" = "character",
        "abort_ind" = "character",
        "hiv_ind" = "character"
    ) 
)  

However, seeing your comment that you did attempt read.table (possibly without colClasses ), the wrappers have some arguments that may help such as quote = "\"" and fill=TRUE . Therefore, consider using those methods but change sep argument:但是,看到您确实尝试read.table的评论(可能没有colClasses ),包装器有一些可能有帮助的参数,例如quote = "\""fill=TRUE 。因此,请考虑使用这些方法但更改sep参数:

SPARCS_DIALOG <- read.csv(
    "SPARCS_DIAG.dat",
    sep = "|",
    colClasses = c(
        "clm_trans_id" = "character",
        "disch_yr" = "character",
        "dx_type_cd" = "character",
        ...  # REST OF COLUMNS
    ) 
)  

SPARCS_DIALOG <- read.delim(
    "SPARCS_DIAG.dat",
    sep = "|",
    colClasses = c(
        "clm_trans_id" = "character",
        "disch_yr" = "character",
        "dx_type_cd" = "character",
        ...  # REST OF COLUMNS
    ) 
)  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM