[英]R tilde operator: What does ~0+a means?
I have seen how to use ~ operator in formula. 我已经看到如何在公式中使用〜运算符。 For example
y~x
means: y is distributed as x. 例如,
y~x
表示:y分布为x。
However I am really confused of what does ~0+a
means in this code: 但是我真的很困惑这个代码中的
~0+a
手段:
require(limma)
a = factor(1:3)
model.matrix(~0+a)
Why just model.matrix(a)
does not work? 为什么只有
model.matrix(a)
不起作用? Why the result of model.matrix(~a)
is different from model.matrix(~0+a)
? 为什么
model.matrix(~a)
的结果与model.matrix(~0+a)
? And finally what is the meaning of ~ operator here? 最后〜操作符的含义是什么?
~
creates a formula - it separates the righthand and lefthand sides of a formula ~
创建一个公式 - 它将公式的右侧和左侧分开
From ?`~`
从
?`~`
Tilde is used to separate the left- and right-hand sides in model formula
Tilde用于分离模型公式中的左侧和右侧
Quoting from the help for formula 引用公式的帮助
The models fit by, eg, the lm and glm functions are specified in a compact symbolic form.
通过例如lm和glm函数拟合的模型以紧凑的符号形式指定。 The ~ operator is basic in the formation of such models.
〜运算符是这种模型形成的基础。 An expression of the form y ~ model is interpreted as a specification that the response y is modelled by a linear predictor specified symbolically by model.
形式为y~exode的表达式被解释为响应y由由模型符号指定的线性预测器建模的规范。 Such a model consists of a series of terms separated by + operators.
这样的模型由一系列由+运算符分隔的术语组成。 The terms themselves consist of variable and factor names separated by : operators.
术语本身由变量和因子名称组成,由运算符分隔。 Such a term is interpreted as the interaction of all the variables and factors appearing in the term.
这个术语被解释为该术语中出现的所有变量和因素的相互作用。
In addition to + and :, a number of other operators are useful in model formulae.
除了+和:之外,许多其他运算符在模型公式中也很有用。 The * operator denotes factor crossing: a*b interpreted as a+b+a:b.
*运算符表示因子交叉:a * b被解释为a + b + a:b。 The ^ operator indicates crossing to the specified degree.
^运算符表示交叉到指定的度数。 For example (a+b+c)^2 is identical to (a+b+c)*(a+b+c) which in turn expands to a formula containing the main effects for a, b and c together with their second-order interactions.
例如(a + b + c)^ 2与(a + b + c)*(a + b + c)相同,后者又扩展为包含a,b和c及其第二个的主效应的公式订单交互。 The %in% operator indicates that the terms on its left are nested within those on the right.
%in%运算符表示其左侧的术语嵌套在右侧的术语中。 For example a + b %in% a expands to the formula a + a:b.
例如,a + b%in%a扩展为公式a + a:b。 The - operator removes the specified terms, so that (a+b+c)^2 - a:b is identical to a + b + c + b:c + a:c.
- 运算符删除指定的项,因此(a + b + c)^ 2 - a:b与a + b + c + b:c + a:c相同。 It can also used to remove the intercept term: when fitting a linear model y ~ x - 1 specifies a line through the origin.
它还可用于删除截距项:当拟合线性模型时,y~x - 1指定通过原点的直线。 A model with no intercept can be also specified as y ~ x + 0 or y ~ 0 + x.
没有截距的模型也可以指定为y~x + 0或y~0 + x。
~a+0
~a+0
具体问题 a
is a factor, model.matrix(~a)
will return an intercept column which is a1
(You need n-1
indicators to fully specify n
classes) a
因素, model.matrix(~a)
将返回一个拦截列,即a1
(你需要n-1
指标才能完全指定n
类) The help files for each function are well written, detailed and easy to find! 每个功能的帮助文件都写得很好,详细且易于查找!
model.matrix(a)
work model.matrix(a)
工作 model.matrix(a)
doesn't work because a
is a factor
variable, not a formula or terms object model.matrix(a)
不起作用,因为a
是factor
变量,而不是公式或术语对象
From the help for model.matrix
来自
model.matrix
的帮助
object an object of an appropriate class.
对象一个适当类的对象。 For the default method, a model formula or a terms object.
对于默认方法,模型公式或术语对象。
R
is looking for a particular class of object, by passing a formula ~a
you are passing an object that is of class formula
. R
正在寻找一个特定的对象类,通过传递一个公式~a
你传递一个类formula
的对象。 model.matrix(terms(~a))
would also work, (passing the terms object corresponding to the formula ~a
model.matrix(terms(~a))
也可以工作,(传递对应于公式的术语对象~a
@BenBolker helpfully notes in his comment, This is a modified version of Wilkinson-Rogers notation. @BenBolker在评论中有用地指出,这是威尔金森 - 罗杰斯符号的修改版本。
There is a good description in the Introduction to R . R简介中有一个很好的描述。
After reading several manuals, I was confused by the meaning of model.matrix(~0+x)
ountil recently that I found this excellent book chapter . 在阅读了几本手册之后,我最近对
model.matrix(~0+x)
的含义感到困惑,我发现这本书很精彩 。
In mathematics 0+a
is equal to a
and writing a term like 0+a
is very strange. 在数学中,
0+a
等于a
,写一个像0+a
这样的术语非常奇怪。 However we are here dealing with linear models: A simple high-school equation such as y=ax+b
that uncovers the relationship between the predictor variable (x) and the observation (y). 然而,我们在这里处理线性模型:一个简单的高中方程,如
y=ax+b
,它揭示了预测变量(x)和观测值(y)之间的关系。
So we can think of ~0+x
or equally ~x+0
as an equation of the form: y=ax+b
. 因此我们可以将
~0+x
或同等~x+0
视为形式的等式: y=ax+b
。 By adding 0
we are forcing b
to be zero, that means that we are looking for a line passing the origin (no intercept). 通过加
0
我们强制b
为零,这意味着我们正在寻找一条通过原点的线(没有截距)。 If we indicated a model like ~x+1
or just ~x
, there fitted equation could possibily contain a non-zero term b
. 如果我们指出像
~x+1
或只是~x
,那么拟合方程可能包含非零项b
。 Equally we may restrict b
by a formula ~x-1
or ~-1+x
that both mean: no intercept (the same way we exclude a row or column in R by negative index). 同样,我们可以通过公式
~x-1
或~-1+x
~x-1
~-1+x
来限制b
,这两者都意味着:没有截距(与我们通过负指数排除R中的行或列的方式相同)。 However something like ~x-2
or ~x+3
is meaningless. 然而,像
~x-2
或~x+3
是没有意义的。
Thanking @mnel for the useful comment, finally what's the reason to use ~
and not =
? 感谢@mnel的有用评论,最后是什么原因使用
~
而不是=
? In standard mathematical terminology / symbology y~x
denotes that y is equivalent to x, it is somewhat weaker that y=x
. 在标准数学术语/符号体系中,
y~x
表示y等于x, y=x
稍微弱一些。 When you are fitting a linear model, you aren't really saying y=x
, but more that you can model y as a linear function of x ( y = ax+b
for example) 当您拟合线性模型时,您并不是真的说
y=x
,而是您可以将y建模为x的线性函数(例如, y = ax+b
)
To answer part of your question, tilde is used to separate the left- and right-hand sides in model formula. 为了回答部分问题,使用波浪号分隔模型公式中的左侧和右侧。 See
?"~"
for more help. 请参阅
?"~"
以获取更多帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.