简体   繁体   English

dplyr:使用NA逐行变异时出错

[英]dplyr: error with rowwise mutate with NA

I am getting strange errors with row-wise mutate in dplyr . 我在dplyrdplyr按行mutate奇怪错误。 Here is an example: 这是一个例子:

set.seed(1)
df <- data.frame(a = rnorm(5), b = rnorm(5))
df[2,'b'] <- NA

There is no trouble with sum , but summary functions are problematic: sum没什么问题,但摘要功能有问题:

mutate(rowwise(df), sum(a, b, na.rm = T)) # works

mutate(rowwise(df), mean(a, b, na.rm = T))
#! Error: missing value where TRUE/FALSE needed
mutate(rowwise(df), median(a, b, na.rm = T))
#! Error: unused argument (-0.820468384118015)

Now, we can try to NA in the first column: 现在,我们可以尝试NA在第一列:

df <- data.frame(a = rnorm(5), b = rnorm(5))
df[2,'a'] <- NA

mutate(rowwise(df), sum(a, b, na.rm = T)) # works

mutate(rowwise(df), mean(a, b, na.rm = T))
#! no error, but returns `NaN`
mutate(rowwise(df), median(a, b, na.rm = T))
#! Error: unused argument (-0.820468384118015)

I am not sure if I am doing something wrong here. 我不确定在这里是否做错了什么。 I think the expected behavior should be the same as: 我认为预期的行为应与以下内容相同:

as.data.frame(apply(df, 1, mean, na.rm = T)

Thanks! 谢谢!

Your error is that you are calling mean and median incorrectly. 您的错误是您错误地调用了meanmedian

While sum can take any number of arguments and will just add them all, mean and median take in only ONE x argument to take the mean/median of. 尽管sum可以采用任意数量的参数,并且只会将它们全部相加,但meanmedian仅采用一个x参数来取其平均值/中位数。

Just like if a and b were vectors and you wanted the mean of the combined vector you'd use mean(c(a, b)) rather than mean(a,b) , you do the same here: 就像ab是向量,并且您想要组合向量的均值一样,您将使用mean(c(a, b))而不是mean(a,b) ,您可以在此处执行以下操作:

mutate(rowwise(df), mean=mean(c(a, b), na.rm = T), med=median(c(a, b), na.rm=T))

(side note: you are only calculating the mean and median of 2 values at a time here, so the mean equals the median anyway...) (注意:您一次只计算两个值的平均值和中位数,因此无论如何平均值等于中位数...)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM