简体   繁体   English

dplyr按行排列并使用自定义函数进行更改会返回意外的输出

[英]dplyr rowwise and mutate with custom function returns unexpected output

I have a dataframe that looks like this in R: 我在R中有一个看起来像这样的数据框:

library(dplyr)

group <- c(1,2,3,4,5,6)
num_click <- c(33000, 34000, 35000, 33500, 34500, 32900)
num_open <- c(999000, 999500, 1000000, 1000050, 985000, 999999)
df <- data.frame(group, num_click, num_open)

> df
#  group num_click num_open
# 1     1     33000   999000
# 2     2     34000   999500
# 3     3     35000  1000000
# 4     4     33500  1000050
# 5     5     34500   985000
# 6     6     32900   999999

and I've written two trivial functions that I would like to apply to each row: 并且我编写了两个琐碎的函数,希望将它们应用于每一行:

prop_test_ctr <- function(open, click){
  return(prop.test(c(click, 34000), c(open, 999000), correct = FALSE)$p.value)
}

add_one_to_group <- function(group) {
  return(group + 1)
}

The prop_test_ctr function uses the prop.test function from R's stats package to test the null hypothesis that the proportions of several groups are the same; prop_test_ctr函数使用R的stats包中的prop.test函数来测试零假设,即几个组的比例相同; the $p.value is the output value I am grabbing here which corresponds to the p-value of the test. $p.value是我在这里获取的输出值,它对应于测试的p值。

The add_one_to_group function is a simple function to add 1 to each group_num in the df so I can verify that rowwise() is working as expected. add_one_to_group函数是将df中的每个group_num加1的简单函数,因此我可以验证rowwise()是否按预期工作。

When I try to build a new results dataframe by applying the two functions to each row using dyplr's rowwise() with the following: 当我尝试通过使用dyplr的rowwise()将两个函数应用于每一行来构建新的results数据rowwise()时,如下所示:

results <- df %>%
  filter(group %in% c(1,2)) %>%
  rowwise() %>%
  mutate(p_value_ctr = prop_test_ctr(num_open,num_click),
         group_plus_one = add_one_to_group(group))

it yields this output: 它产生以下输出:

results
# A tibble: 2 x 5
  group num_click num_open   p_value_ctr group_plus_one
* <dbl>     <dbl>    <dbl>         <dbl>          <dbl>
1     1     33000   999000 0.00004201837              2
2     2     34000   999500 0.00004201837              3

Where the p_value_ctr is column is incorrect - instead of calculating the p-value for the difference in clicks and opens for each row, it calculates the p-value for the combination of groups 2,3 and the values hard-coded in the prop_test_ctr function (34000 and 999000). p_value_ctr是列的地方不正确-代替计算单击和打开每一行的差异的p值,它计算组2,3和prop_test_ctr函数中硬编码的值的组合的p值(34000和999000)。

The add_one_to_group function works as expected with use of rowwise() but the p_value_ctr does not. 使用rowwise()add_one_to_group函数可以按预期工作,但p_value_ctr则不能。 The p-value that the p_value_ctr function returns is actually equal to the same value as if I ran: p_value_ctr函数返回的p值实际上等于我运行时的值:

prop.test(c(33000, 34000, 34000), c(999000, 999500, 999000))$p.value

which appears that the vector of column clicks and opens for both groups 2 and 3 is being passed to the function instead of the intended column value for just one row (hence the user of rowwise() . 这似乎表明第2组和第3组的列clicksopens的向量正在传递给该函数,而不是仅将一行的预期列值传递给该函数(因此, rowwise()的用户rowwise()

I know there are other ways to accomplish this, but specifically curious if I can stay within the dpylr universe here (as opposed to using sapply() and then cbind those results the the original df, for example) because it seems like this should be the intended behavior of rowwise() ; 我知道还有其他方法可以做到这一点,但是我特别想知道我是否可以留在dpylr宇宙中(例如,与使用sapply()然后将这些结果绑定到原始df相对),因为看起来应该是这样rowwise()的预期行为; I've just messed something up. 我刚刚搞砸了。

Thank you for your help!! 谢谢您的帮助!!

It looks like the problem was due to the mutate function being masked by another identically named function (most likely plyr::mutate ). 看来问题出在mutate函数被另一个同名函数掩盖了(很可能是plyr::mutate )。 Restarting in a clean R session fixed the problem. 在干净的R会话中重新启动可解决此问题。

Thank you @user2738526 for your response! 谢谢@ user2738526的回复! Looks like mutate being masked was the issue 看起来变异被掩盖是问题所在

Because of the generic nature of dplyr function names, I often define their package with dplyr:: even then I've attached its package. 由于dplyr函数名称的通用性质,我经常使用dplyr::定义其软件包dplyr::即使那样,我仍然附加了其软件包。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM