简体   繁体   English

dplyr-使用rowwise()时在mutate中未正确传递的因子级别的名称

[英]dplyr - names of levels of factor not being passed properly within mutate when using rowwise()

First I am very new to R, and I'm aware that I may making an obvious mistake, I have searched for an answer, but maybe I'm searching for the wrong thing. 首先,我对R非常陌生,我知道自己可能犯了一个明显的错误,已经寻找了答案,但是也许我在寻找错误的东西。

I am trying to apply a function to add a new column to a dataframe based on the contents of that row. 我正在尝试基于该行的内容应用函数以将新列添加到数据框。 But it looks to me like the values in the row are not being handled properly in the mutate function when using rowwise . 但是在我看来,当使用rowwise时, mutate函数中的行中的值未正确处理。 I've tried to create a toy example to demonstrate my problem. 我试图创建一个玩具示例来演示我的问题。

library(dplyr)    
x<-c("A,"B")
y<-c(1,2)
df<-data.frame(x,y)

Then I have a function to create a new column called z which adds 1 to y if the value of x is "A" and adds 2 to y if the value of x is "B" . 然后我有一个函数来创建一个新的列称为z这增加了1到y如果值x"A" ,并增加了2至y如果值x"B" Note that I have added print(x) to show what is going on. 请注意,我添加了print(x)来显示正在发生的事情。

calculatez <- function(x,y){
  print(x)
  if(x == "A"){
    return (y+1)
  } 
  else{
    return(y+2)
  } 
}

I then try to use mutate : 然后,我尝试使用mutate

df %>%
  rowwise() %>%
  mutate(z = calculatez(x,y))

and I get the following, 2 has been added to both rows, rather than 1 to the first row and the "A" and "B" have been passed into the function as 1 and 2 . 我得到以下内容,两行都添加了2,而不是第一行添加了1,并且已将"A""B"作为12传递给函数。

[1] 1
[1] 2
Source: local data frame [2 x 3]
Groups: 

  x y z
1 A 1 3
2 B 2 4

If I remove the rowwise() function the "A" and "B" appear to be being passed properly, but clearly I don't get the right result. 如果我删除rowwise()函数,则似乎正确传递了"A""B" ,但是显然我没有得到正确的结果。

df %>%
  mutate(z = calculatez(x,y))

[1] A B
Levels: A B
  x y z
1 A 1 2
2 B 2 3
Warning message:
In if (x == "A") { :
  the condition has length > 1 and only the first element will be used

I can get it to work if I try to do it without writing my own function and then I don't get the error message about the length of the condition. 如果我尝试不编写自己的函数就可以使它工作,那么我就不会收到有关条件长度的错误消息。 So I don't think I understand properly what rowwise() is doing. 所以我认为我不正确理解rowwise()在做什么。

df %>%
  mutate(z = ifelse(x=="A",y+1,y+2))

  x y z
1 A 1 2
2 B 2 4

But I want to be able to use my own function, because in my real application the condition is more complicated and it will be difficult to read with lots of nested ifelse functions in the mutate function. 但是我希望能够使用自己的函数,因为在我的实际应用程序中,条件更加复杂,并且在mutate函数中使用大量嵌套的ifelse函数将很难读取。

I can get round the problem by changing my condition to if(x==1) but that will make my code difficult to understand. 我可以通过将条件更改为if(x==1)来解决问题,但这会使我的代码难以理解。

I don't want to waste your time, so sorry if I'm missing something obvious. 我不想浪费您的时间,如果我缺少明显的东西,对不起。 Any tips on where I'm going wrong? 关于我要去哪里的任何提示?

You could use rowwise with do 您可以将rowwisedo rowwise使用

 df %>% 
 rowwise() %>% 
 do(data.frame(., z= calculatez(.$x, .$y)))

gives the output 给出输出

     x y z
  #1 A 1 2
  #2 B 2 4

Or you could do: 或者您可以这样做:

  df %>%
  group_by(N=row_number()) %>% 
  mutate(z=calculatez(x,y))%>% 
  ungroup() %>%
  select(-N)

Using a different dataset: 使用其他数据集:

df <- structure(list(x = structure(c(1L, 1L, 2L, 2L, 2L), .Label = c("A", 
"B"), class = "factor"), y = c(1, 2, 1, 2, 1)), .Names = c("x", 
"y"), row.names = c(NA, -5L), class = "data.frame")

Running the above code gives: 运行上面的代码将给出:

 #  x y z
 #1 A 1 2
 #2 A 2 3
 #3 B 1 3
 #4 B 2 4
 #5 B 1 3

If you are using data.table 如果您使用的是data.table

library(data.table)
setDT(df)[, z := calculatez(x,y), by=seq_len(nrow(df))]
df
#    x y z
# 1: A 1 2
# 2: A 2 3
# 3: B 1 3
# 4: B 2 4
# 5: B 1 3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM