[英]Error in group_by function in dplyr
I've looked through the related dplyr questions, the R documentation, and attempted to sort through what I believe is a syntax misunderstanding. 我浏览了相关的dplyr问题,R文档,并试图对我认为是语法误解的内容进行排序。
Here is sample data that reflects the strx of my data. 这是反映我数据strx的示例数据。
id <- c(1:20)
xvar <- seq(from=2.0, to=6.0, length.out=100)
yvar <- c(1:100)
binary <- sample(x=c(0,1), size=100, replace=TRUE)
breaks <- c(0,11,21,31,41,51,61,71,81,91,100)
df <- data.frame(id, xvar, yvar, binary)
df <- transform(df, bin=cut(yvar, breaks))
id xvar yvar binary bin
1 1 2.000000 1 1 (0,11]
2 2 2.040404 2 0 (0,11]
3 3 2.080808 3 0 (0,11]
4 4 2.121212 4 0 (0,11]
5 5 2.161616 5 1 (0,11]
6 6 2.202020 6 0 (0,11]
I'd like to run the following, looking at how the xvar
means, divided by the binary
variable, are significantly different based on the bin
group they belong to. 我想运行以下命令,以
xvar
除以binary
变量表示的含义为基础,根据它们所属的bin
组有何显着不同。
pval <- df %>% group_by(bin) %>% summarise(p.value=t.test(xvar ~ factor(binary))$p.value)
However, I continue to get the error: "grouping factor must have exactly 2 levels" 但是,我继续收到错误:“分组因子必须恰好具有2个级别”
I saw a similar post to this, but the problem was how the T.test was being run. 我看到了与此类似的帖子,但是问题是T.test的运行方式。 I've ran this same code using a different
group_by
object and it worked just fine. 我已经使用不同的
group_by
对象运行了相同的代码,并且效果很好。 The data time was a factor and everything. 数据时间是一个重要因素。
Any thoughts? 有什么想法吗? I also would appreciate critiques on how to improve the manner in which this question was posed.
我也希望对如何改善提出这个问题的方式提出批评。
You don't want to use dplyr for this. 您不想为此使用dplyr。 You want to fit a linear model .
您想拟合线性模型 。
mod <- lm(xvar ~ binary*bin, data=df)
anova(mod)
For further discussion of what the coefficients, P-values and sums of squares mean, consider asking on stats.SE. 要进一步讨论系数,P值和平方和的含义,请考虑询问stats.SE。
I think I've resolved the issue. 我想我已经解决了这个问题。
"Grouping factor must have exactly 2 levels" comes from whenever there is not enough data in the t.test. 只要t.test中没有足够的数据,就会出现“分组因子必须具有准确的2个级别”。 I just assumed my original data set, which is large, would have enough to not run into this issue.
我只是假设我的原始数据集很大,足以避免出现此问题。
When I made the sample data more robust, the error disappeared. 当我使样本数据更可靠时,错误消失了。
Sorry for the wasted time, and thank you for your help! 很抱歉浪费时间,谢谢您的帮助!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.