简体   繁体   中英

r weighted crosstab p value vs SAS

http://support.sas.com/documentation/cdl/en/procstat/67528/HTML/default/viewer.htm#procstat_freq_gettingstarted01.htm

I am getting a different answer in r than the one from site given above. First I will give the SAS code with results and then the r code with results. The SAS code with results is first given:

data SummerSchool; 
   input Gender $ Internship $ Enrollment $ Count @@; 
   datalines;
boys  yes yes 35   boys  yes no 29 
boys   no yes 14   boys   no no 27
girls yes yes 32   girls yes no 10  
girls  no yes 53   girls  no no 23
;
proc freq data=SummerSchool order=data;
   tables Internship*Enrollment / chisq;
   weight Count;

run;

Output

Table   
    yes no
yes 67  39
no  67  50

                              Value     Prob  
Chi-Square                      1   0.8189  0.3655
Likelihood Ratio Chi-Square     1   0.8202  0.3651
Continuity Adj. Chi-Square      1   0.5899  0.4425
Mantel-Haenszel Chi-Square      1   0.8153  0.3666

=========================

Now I will give the r code using both the weights package and the survey package.

> tt$nnn=as.numeric(tt$count)
> attach(tt)
> tt
    sex internship enrollment count nnn
1  boys        yes        yes    35  35
2  boys         no        yes    14  14
3 girls        yes        yes    32  32
4 girls         no        yes    53  53
5  boys        yes         no    29  29
6  boys         no         no    27  27
7 girls        yes         no    10  10
8 girls         no         no    23  23
> library(plyr)
> count(tt,c('internship','enrollment'),wt_var='nnn')
  internship enrollment freq
1         no         no   50
2         no        yes   67
3        yes         no   39
4        yes        yes   67
> library(weights)
> wtd.chi.sq(internship,enrollment,weight=nnn)
    Chisq        df   p.value 
0.0293791 1.0000000 0.8639066
> library(survey) 
> tt.d=svydesign(ids = ~1, data =tt,weights =tt$nnn)
> svychisq(~internship+enrollment,tt.d)

        Pearson's X^2: Rao & Scott adjustment

data:  svychisq(~internship + enrollment, tt.d)
F = 0.023599, ndf = 1, ddf = 7, p-value = 0.8822

The 2 r results essentially agree with each other (.86 & .88) but are completely different from the SAS results (between .37 and .44). Is it possible that SAS is giving a one sided result and r is giving a 2 sided result? If so, what are the pros and cons of a one side vs two sided result in this situation?

I think you are misusing the survey package weights -argument. Lumley's book that accompanies that package distinguishes three possible interpretations for the term weights. The SAS example demonstrates the "case weights" meaning. You can get equivalent results with ordinary R code: Compare this output with the SAS Continuity Adj. Chi-Square:

chisq.test(   xtabs( count ~ internship+enrollment, data=tt) )

    Pearson's Chi-squared test with Yates' continuity correction

data:  xtabs(count ~ internship + enrollment, data = tt)
X-squared = 0.58989, df = 1, p-value = 0.4425

The survey package is designed to give you the capacity to replicate the results of the more sophisticated procedures in SAS, namely PROC SURVEYMEANS, PROC SURVEYFREQ, and PROC SURVEYREG. Also it can provide the same capabilities as SUDAAN.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM