简体   繁体   English

SAS proc制表(频率)

[英]sas proc tabulate (freq)

I have the following question: 我有以下问题:

Some sample data: 一些样本数据:

data;
input id article sex count;
datalines;
1 139 1 2
2 139 2 2
3 146 2 1
4 146 2 2
5 146 1 0
6 111 2 10
6 111 1 1
;
run;

Now, I have this code: 现在,我有以下代码:

proc tabulate;
freq count;
class article sex;
table article, sex /misstext='0';
run;

Is there any difference compared to the following code? 与以下代码相比有什么区别?

proc tabulate;
var count;
class article sex;
table article, sex*count;
run;

Or does does it exactly do the same thing? 还是完全一样? Which one is recommendable? 推荐哪一个?

Take notice of the output produced by the run of the two tabulate variations. 注意这两个tabulate变量的运行所产生的输出。

在此处输入图片说明

For the data set at hand the results are the same, presented differently. 对于手头的数据集,结果是相同的,只是呈现方式不同。

  • The first has sex class cells that are an implicit frequency ( N ) computation that is count weighted, also implicitly formatted as an integer. 第一个具有性类单元,它们是隐式频率( N )计算的count加权,也隐式格式为整数。 The implicits are default behavior in absence of other statements and options. 隐式是没有其他语句和选项的默认行为。
  • The second has sex class cells that are the computed sum of count , formatted with default 2 decimal places. 第二个具有性别类单元格,它们是计算得出的count总和,其格式默认为2个小数位。

If the data set had additional var variables used in table , the statistical computations to perform, and the role of weighting, would be dependent on the nature of presentation you are making and the audience consuming it. 如果数据集在table使用了其他var变量,则要执行的统计计算以及权重的作用将取决于您正在制作的演示文稿的性质以及使用该演示文稿的受众。 You might want or not want 'count' frequency weighting affecting the statistical computations. 您可能希望或不希望“计数”频率加权影响统计计算。

Ask 5 people for a recommendation, you might get 6! 向5个人推荐,您可能会得到6!

From online documentation, compare the details of the FREQ statement to the WEIGHT statement: 从在线文档中,将FREQ语句的详细信息与WEIGHT语句进行比较:

FREQ variable ; FREQ 变量 ;

Required Argument 必填参数

variable 变量

  • specifies a numeric variable whose value represents the frequency of the observation. 指定一个数字变量,其值表示观察的频率。 If you use the FREQ statement, then the procedure assumes that each observation represents n observations, where n is the value of variable. 如果使用FREQ语句,则该过程假定每个观察值表示n个观察值,其中n是变量的值。 If n is not an integer, then SAS truncates it. 如果n不是整数,则SAS将其截断。 If n is less than 1 or is missing, then the procedure does not use that observation to calculate statistics. 如果n小于1或丢失,则该过程不使用该观察值来计算统计量。
  • The sum of the frequency variable represents the total number of observations. 频率变量的总和表示观测的总数。

and

WEIGHT variable ; WEIGHT 变量 ;

Required Argument 必填参数

variable 变量

  • specifies a numeric variable whose values weight the values of the analysis variables. 指定一个数值变量,其值加权分析变量的值。 The values of the variable do not have to be integers. 变量的值不必是整数。 PROC TABULATE responds to weight values in accordance with the following table. PROC TABULATE根据下表响应重量值。
    0 : Counts the observation in the total number of observations 0 :将观察数计入观察总数
  • <0 : Converts the value to zero and counts the observation in the total number of observations <0 :将值转换为零,并将观测值计入观测总数中
  • .missing : Excludes the observation .missing :排除观察

To exclude observations that contain negative and zero weights from the analysis, use EXCLNPWGT. 要从分析中排除权重为负和零的观测值,请使用EXCLNPWGT。 Note that most SAS/STAT procedures, such as PROC GLM, exclude negative and zero weights by default. 请注意,默认情况下,大多数SAS / STAT过程(例如PROC GLM)不包括负权重和零权重。

Note: Prior to Version 7 of SAS, the procedure did not exclude the observations with missing weights from the count of observations. 注意:在SAS版本7之前,该程序并未从观察值计数中排除权重缺失的观察值。

Restrictions 限制条件

  • To compute weighted quantiles, use QMETHOD=OS in the PROC statement. 要计算加权分位数,请在PROC语句中使用QMETHOD = OS。
  • PROC TABULATE will not compute MODE when a weight variable is active. 当权重变量处于活动状态时,PROC TABULATE将不会计算MODE。 Instead, try using PROC UNIVARIATE when MODE needs to be computed and a weight variable is active. 相反,当需要计算MODE并且权重变量处于活动状态时,请尝试使用PROC UNIVARIATE。

Interaction 相互作用

  • If you use the WEIGHT= option in a VAR statement to specify a weight variable, then PROC TABULATE uses this variable instead to weight those VAR statement variables. 如果在VAR语句中使用WEIGHT =选项来指定权重变量,则PROC TABULATE会使用该变量来权重那些VAR语句变量。

Tip 小费

  • When you use the WEIGHT statement, consider which value of the VARDEF= option is appropriate. 当您使用WEIGHT语句时,请考虑使用VARDEF =选项的哪个值合适。 See the discussion of VARDEF=divisor and the calculation of weighted statistics in the Keywords and Formulas section of this document. 请参阅本文档“关键字和公式”部分中有关VARDEF =除数的讨论以及加权统计的计算。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM