简体   繁体   English

R代码运算符:〜0 + a表示什么?

[英]R tilde operator: What does ~0+a means?

I have seen how to use ~ operator in formula. 我已经看到如何在公式中使用〜运算符。 For example y~x means: y is distributed as x. 例如, y~x表示:y分布为x。

However I am really confused of what does ~0+a means in this code: 但是我真的很困惑这个代码中的~0+a手段:

require(limma)
a = factor(1:3)
model.matrix(~0+a)

Why just model.matrix(a) does not work? 为什么只有model.matrix(a)不起作用? Why the result of model.matrix(~a) is different from model.matrix(~0+a) ? 为什么model.matrix(~a)的结果与model.matrix(~0+a) And finally what is the meaning of ~ operator here? 最后〜操作符的含义是什么?

~ creates a formula - it separates the righthand and lefthand sides of a formula ~创建一个公式 - 它将公式的右侧和左侧分开

From ?`~` ?`~`

Tilde is used to separate the left- and right-hand sides in model formula Tilde用于分离模型公式中的左侧和右侧

Quoting from the help for formula 引用公式的帮助

The models fit by, eg, the lm and glm functions are specified in a compact symbolic form. 通过例如lm和glm函数拟合的模型以紧凑的符号形式指定。 The ~ operator is basic in the formation of such models. 〜运算符是这种模型形成的基础。 An expression of the form y ~ model is interpreted as a specification that the response y is modelled by a linear predictor specified symbolically by model. 形式为y~exode的表达式被解释为响应y由由模型符号指定的线性预测器建模的规范。 Such a model consists of a series of terms separated by + operators. 这样的模型由一系列由+运算符分隔的术语组成。 The terms themselves consist of variable and factor names separated by : operators. 术语本身由变量和因子名称组成,由运算符分隔。 Such a term is interpreted as the interaction of all the variables and factors appearing in the term. 这个术语被解释为该术语中出现的所有变量和因素的相互作用。

In addition to + and :, a number of other operators are useful in model formulae. 除了+和:之外,许多其他运算符在模型公式中也很有用。 The * operator denotes factor crossing: a*b interpreted as a+b+a:b. *运算符表示因子交叉:a * b被解释为a + b + a:b。 The ^ operator indicates crossing to the specified degree. ^运算符表示交叉到指定的度数。 For example (a+b+c)^2 is identical to (a+b+c)*(a+b+c) which in turn expands to a formula containing the main effects for a, b and c together with their second-order interactions. 例如(a + b + c)^ 2与(a + b + c)*(a + b + c)相同,后者又扩展为包含a,b和c及其第二个的主效应的公式订单交互。 The %in% operator indicates that the terms on its left are nested within those on the right. %in%运算符表示其左侧的术语嵌套在右侧的术语中。 For example a + b %in% a expands to the formula a + a:b. 例如,a + b%in%a扩展为公式a + a:b。 The - operator removes the specified terms, so that (a+b+c)^2 - a:b is identical to a + b + c + b:c + a:c. - 运算符删除指定的项,因此(a + b + c)^ 2 - a:b与a + b + c + b:c + a:c相同。 It can also used to remove the intercept term: when fitting a linear model y ~ x - 1 specifies a line through the origin. 它还可用于删除截距项:当拟合线性模型时,y~x - 1指定通过原点的直线。 A model with no intercept can be also specified as y ~ x + 0 or y ~ 0 + x. 没有截距的模型也可以指定为y~x + 0或y~0 + x。

So regarding specific issue with ~a+0 所以关于~a+0具体问题

  • You creating a model matrix without an intercept. 您创建没有截距的模型矩阵。 As a is a factor, model.matrix(~a) will return an intercept column which is a1 (You need n-1 indicators to fully specify n classes) 作为a因素, model.matrix(~a)将返回一个拦截列,即a1 (你需要n-1指标才能完全指定n类)

The help files for each function are well written, detailed and easy to find! 每个功能的帮助文件都写得很好,详细且易于查找!

why doesn't model.matrix(a) work 为什么没有model.matrix(a)工作

model.matrix(a) doesn't work because a is a factor variable, not a formula or terms object model.matrix(a)不起作用,因为afactor变量,而不是公式或术语对象

From the help for model.matrix 来自model.matrix的帮助

object an object of an appropriate class. 对象一个适当类的对象。 For the default method, a model formula or a terms object. 对于默认方法,模型公式或术语对象。

R is looking for a particular class of object, by passing a formula ~a you are passing an object that is of class formula . R正在寻找一个特定的对象类,通过传递一个公式~a你传递一个类formula的对象。 model.matrix(terms(~a)) would also work, (passing the terms object corresponding to the formula ~a model.matrix(terms(~a))也可以工作,(传递对应于公式的术语对象~a


general note 一般说明

@BenBolker helpfully notes in his comment, This is a modified version of Wilkinson-Rogers notation. @BenBolker在评论中有用地指出,这是威尔金森 - 罗杰斯符号的修改版本。

There is a good description in the Introduction to R . R简介中有一个很好的描述。

After reading several manuals, I was confused by the meaning of model.matrix(~0+x) ountil recently that I found this excellent book chapter . 在阅读了几本手册之后,我最近对model.matrix(~0+x)的含义感到困惑,我发现这本书精彩

In mathematics 0+a is equal to a and writing a term like 0+a is very strange. 在数学中, 0+a等于a ,写一个像0+a这样的术语非常奇怪。 However we are here dealing with linear models: A simple high-school equation such as y=ax+b that uncovers the relationship between the predictor variable (x) and the observation (y). 然而,我们在这里处理线性模型:一个简单的高中方程,如y=ax+b ,它揭示了预测变量(x)和观测值(y)之间的关系。

So we can think of ~0+x or equally ~x+0 as an equation of the form: y=ax+b . 因此我们可以将~0+x或同等~x+0视为形式的等式: y=ax+b By adding 0 we are forcing b to be zero, that means that we are looking for a line passing the origin (no intercept). 通过加0我们强制b为零,这意味着我们正在寻找一条通过原点的线(没有截距)。 If we indicated a model like ~x+1 or just ~x , there fitted equation could possibily contain a non-zero term b . 如果我们指出像~x+1或只是~x ,那么拟合方程可能包含非零项b Equally we may restrict b by a formula ~x-1 or ~-1+x that both mean: no intercept (the same way we exclude a row or column in R by negative index). 同样,我们可以通过公式~x-1~-1+x ~x-1 ~-1+x来限制b ,这两者都意味着:没有截距(与我们通过负指数排除R中的行或列的方式相同)。 However something like ~x-2 or ~x+3 is meaningless. 然而,像~x-2~x+3是没有意义的。

Thanking @mnel for the useful comment, finally what's the reason to use ~ and not = ? 感谢@mnel的有用评论,最后是什么原因使用~而不是= In standard mathematical terminology / symbology y~x denotes that y is equivalent to x, it is somewhat weaker that y=x . 在标准数学术语/符号体系中, y~x表示y等于x, y=x稍微弱一些。 When you are fitting a linear model, you aren't really saying y=x , but more that you can model y as a linear function of x ( y = ax+b for example) 当您拟合线性模型时,您并不是真的说y=x ,而是您可以将y建模为x的线性函数(例如, y = ax+b

To answer part of your question, tilde is used to separate the left- and right-hand sides in model formula. 为了回答部分问题,使用波浪号分隔模型公式中的左侧和右侧。 See ?"~" for more help. 请参阅?"~"以获取更多帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM