简体   繁体   English

在 R 编程语言中使用 ~(波浪号)

[英]Use of ~ (tilde) in R programming Language

I saw in a tutorial about regression modeling the following command:我在关于回归建模的教程中看到以下命令:

myFormula <- Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width

What exactly does this command do, and what is the role of ~ (tilde) in the command?这个命令到底是做什么的,命令中~ (波浪号)的作用是什么?

The thing on the right of <- is a formula object. <-右边的东西是一个formula对象。 It is often used to denote a statistical model, where the thing on the left of the ~ is the response and the things on the right of the ~ are the explanatory variables.它通常用于表示统计模型,其中~左侧的内容是响应, ~右侧的内容是解释变量。 So in English you'd say something like "Species depends on Sepal Length, Sepal Width, Petal Length and Petal Width" .所以在英语中,你会说“物种取决于萼片长度、萼片宽度、花瓣长度和花瓣宽度”

The myFormula <- part of that line stores the formula in an object called myFormula so you can use it in other parts of your R code. myFormula <-该行的一部分将公式存储在名为myFormula的对象中,以便您可以在 R 代码的其他部分使用它。


Other common uses of formula objects in R R 中公式对象的其他常见用途

The lattice package uses them to specify the variables to plot . lattice包使用它们来指定要绘制的变量
The ggplot2 package uses them to specify panels for plotting . ggplot2包使用它们来指定绘图面板
The dplyr package uses them for non-standard evaulation . dplyr包将它们用于非标准评估

R defines a ~ (tilde) operator for use in formulas. R 定义了一个~ (波浪号)运算符用于公式。 Formulas have all sorts of uses, but perhaps the most common is for regression:公式有各种各样的用途,但也许最常见的是用于回归:

library(datasets)
lm( myFormula, data=iris)

help("~") or help("formula") will teach you more. help("~")help("formula")会教你更多。

@Spacedman has covered the basics. @Spacedman 已经涵盖了基础知识。 Let's discuss how it works.让我们来讨论它是如何工作的。

First, being an operator, note that it is essentially a shortcut to a function (with two arguments):首先,作为一个操作符,请注意它本质上是一个函数快捷方式(有两个参数):

> `~`(lhs,rhs)
lhs ~ rhs
> lhs ~ rhs
lhs ~ rhs

That can be helpful to know for use in eg apply family commands.这有助于了解在例如apply系列命令中的使用。

Second, you can manipulate the formula as text :其次,您可以将公式作为文本进行操作

oldform <- as.character(myFormula) # Get components
myFormula <- as.formula( paste( oldform[2], "Sepal.Length", sep="~" ) )

Third, you can manipulate it as a list :第三,您可以将其作为列表进行操作

myFormula[[2]]
myFormula[[3]]

Finally, there are some helpful tricks with formulae (see help("formula") for more):最后,还有一些有用的公式技巧(有关更多信息,请参阅help("formula") ):

myFormula <- Species ~ . 

For example, the version above is the same as the original version, since the dot means "all variables not yet used."例如,上面的版本与原始版本相同,因为点表示“尚未使用的所有变量”。 This looks at the data.frame you use in your eventual model call, sees which variables exist in the data.frame but aren't explicitly mentioned in your formula, and replaces the dot with those missing variables.这会查看您在最终模型调用中使用的 data.frame,查看 data.frame 中存在哪些变量但未在公式中明确提及,并将点替换为那些缺失的变量。

In a word,一句话,

The tilde (~) separates the left side of a formula with the right side of the formula. The tilde (~) separates the left side of a formula with the right side of the formula.

For example, in a linear function, it would separate the dependent variable from the independent variables and can be interpreted as saying, “as a function of.”例如,在线性函数中,它将因变量与自变量分开,可以解释为“作为...的函数”。 So, when a person's wages (wages) as a function of their years of education (years_of_education), we do something like,所以,当一个人的工资 (wages) 作为他们受教育年限 (years_of_education) 的函数时,我们会做类似的事情,

wages ~ years_of_education

Here,这里,

 Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width

It means, Species is a function of Sepal Length, Sepal Width, Petal Length and Petal Width .这意味着, SpeciesSepal Length, Sepal Width, Petal Length and Petal Width的函数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM