[英]dplyr rowwise and mutate with custom function returns unexpected output
I have a dataframe that looks like this in R: 我在R中有一个看起来像这样的数据框:
library(dplyr)
group <- c(1,2,3,4,5,6)
num_click <- c(33000, 34000, 35000, 33500, 34500, 32900)
num_open <- c(999000, 999500, 1000000, 1000050, 985000, 999999)
df <- data.frame(group, num_click, num_open)
> df
# group num_click num_open
# 1 1 33000 999000
# 2 2 34000 999500
# 3 3 35000 1000000
# 4 4 33500 1000050
# 5 5 34500 985000
# 6 6 32900 999999
and I've written two trivial functions that I would like to apply to each row: 并且我编写了两个琐碎的函数,希望将它们应用于每一行:
prop_test_ctr <- function(open, click){
return(prop.test(c(click, 34000), c(open, 999000), correct = FALSE)$p.value)
}
add_one_to_group <- function(group) {
return(group + 1)
}
The prop_test_ctr
function uses the prop.test
function from R's stats package to test the null hypothesis that the proportions of several groups are the same; prop_test_ctr
函数使用R的stats包中的prop.test
函数来测试零假设,即几个组的比例相同; the $p.value
is the output value I am grabbing here which corresponds to the p-value of the test. $p.value
是我在这里获取的输出值,它对应于测试的p值。
The add_one_to_group
function is a simple function to add 1 to each group_num in the df so I can verify that rowwise() is working as expected. add_one_to_group
函数是将df中的每个group_num加1的简单函数,因此我可以验证rowwise()是否按预期工作。
When I try to build a new results
dataframe by applying the two functions to each row using dyplr's rowwise()
with the following: 当我尝试通过使用dyplr的
rowwise()
将两个函数应用于每一行来构建新的results
数据rowwise()
时,如下所示:
results <- df %>%
filter(group %in% c(1,2)) %>%
rowwise() %>%
mutate(p_value_ctr = prop_test_ctr(num_open,num_click),
group_plus_one = add_one_to_group(group))
it yields this output: 它产生以下输出:
results
# A tibble: 2 x 5
group num_click num_open p_value_ctr group_plus_one
* <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 33000 999000 0.00004201837 2
2 2 34000 999500 0.00004201837 3
Where the p_value_ctr
is column is incorrect - instead of calculating the p-value for the difference in clicks and opens for each row, it calculates the p-value for the combination of groups 2,3 and the values hard-coded in the prop_test_ctr
function (34000 and 999000). p_value_ctr
是列的地方不正确-代替计算单击和打开每一行的差异的p值,它计算组2,3和prop_test_ctr
函数中硬编码的值的组合的p值(34000和999000)。
The add_one_to_group
function works as expected with use of rowwise()
but the p_value_ctr
does not. 使用
rowwise()
, add_one_to_group
函数可以按预期工作,但p_value_ctr
则不能。 The p-value that the p_value_ctr
function returns is actually equal to the same value as if I ran: p_value_ctr
函数返回的p值实际上等于我运行时的值:
prop.test(c(33000, 34000, 34000), c(999000, 999500, 999000))$p.value
which appears that the vector of column clicks
and opens
for both groups 2 and 3 is being passed to the function instead of the intended column value for just one row (hence the user of rowwise()
. 这似乎表明第2组和第3组的列
clicks
并opens
的向量正在传递给该函数,而不是仅将一行的预期列值传递给该函数(因此, rowwise()
的用户rowwise()
。
I know there are other ways to accomplish this, but specifically curious if I can stay within the dpylr universe here (as opposed to using sapply() and then cbind those results the the original df, for example) because it seems like this should be the intended behavior of rowwise()
; 我知道还有其他方法可以做到这一点,但是我特别想知道我是否可以留在dpylr宇宙中(例如,与使用sapply()然后将这些结果绑定到原始df相对),因为看起来应该是这样
rowwise()
的预期行为; I've just messed something up. 我刚刚搞砸了。
Thank you for your help!! 谢谢您的帮助!!
It looks like the problem was due to the mutate
function being masked by another identically named function (most likely plyr::mutate
). 看来问题出在
mutate
函数被另一个同名函数掩盖了(很可能是plyr::mutate
)。 Restarting in a clean R session fixed the problem. 在干净的R会话中重新启动可解决此问题。
Thank you @user2738526 for your response!
谢谢@ user2738526的回复! Looks like mutate being masked was the issue
看起来变异被掩盖是问题所在
Because of the generic nature of dplyr
function names, I often define their package with dplyr::
even then I've attached its package. 由于
dplyr
函数名称的通用性质,我经常使用dplyr::
定义其软件包dplyr::
即使那样,我仍然附加了其软件包。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.