简体   繁体   中英

dplyr - names of levels of factor not being passed properly within mutate when using rowwise()

First I am very new to R, and I'm aware that I may making an obvious mistake, I have searched for an answer, but maybe I'm searching for the wrong thing.

I am trying to apply a function to add a new column to a dataframe based on the contents of that row. But it looks to me like the values in the row are not being handled properly in the mutate function when using rowwise . I've tried to create a toy example to demonstrate my problem.

library(dplyr)    
x<-c("A,"B")
y<-c(1,2)
df<-data.frame(x,y)

Then I have a function to create a new column called z which adds 1 to y if the value of x is "A" and adds 2 to y if the value of x is "B" . Note that I have added print(x) to show what is going on.

calculatez <- function(x,y){
  print(x)
  if(x == "A"){
    return (y+1)
  } 
  else{
    return(y+2)
  } 
}

I then try to use mutate :

df %>%
  rowwise() %>%
  mutate(z = calculatez(x,y))

and I get the following, 2 has been added to both rows, rather than 1 to the first row and the "A" and "B" have been passed into the function as 1 and 2 .

[1] 1
[1] 2
Source: local data frame [2 x 3]
Groups: 

  x y z
1 A 1 3
2 B 2 4

If I remove the rowwise() function the "A" and "B" appear to be being passed properly, but clearly I don't get the right result.

df %>%
  mutate(z = calculatez(x,y))

[1] A B
Levels: A B
  x y z
1 A 1 2
2 B 2 3
Warning message:
In if (x == "A") { :
  the condition has length > 1 and only the first element will be used

I can get it to work if I try to do it without writing my own function and then I don't get the error message about the length of the condition. So I don't think I understand properly what rowwise() is doing.

df %>%
  mutate(z = ifelse(x=="A",y+1,y+2))

  x y z
1 A 1 2
2 B 2 4

But I want to be able to use my own function, because in my real application the condition is more complicated and it will be difficult to read with lots of nested ifelse functions in the mutate function.

I can get round the problem by changing my condition to if(x==1) but that will make my code difficult to understand.

I don't want to waste your time, so sorry if I'm missing something obvious. Any tips on where I'm going wrong?

You could use rowwise with do

 df %>% 
 rowwise() %>% 
 do(data.frame(., z= calculatez(.$x, .$y)))

gives the output

     x y z
  #1 A 1 2
  #2 B 2 4

Or you could do:

  df %>%
  group_by(N=row_number()) %>% 
  mutate(z=calculatez(x,y))%>% 
  ungroup() %>%
  select(-N)

Using a different dataset:

df <- structure(list(x = structure(c(1L, 1L, 2L, 2L, 2L), .Label = c("A", 
"B"), class = "factor"), y = c(1, 2, 1, 2, 1)), .Names = c("x", 
"y"), row.names = c(NA, -5L), class = "data.frame")

Running the above code gives:

 #  x y z
 #1 A 1 2
 #2 A 2 3
 #3 B 1 3
 #4 B 2 4
 #5 B 1 3

If you are using data.table

library(data.table)
setDT(df)[, z := calculatez(x,y), by=seq_len(nrow(df))]
df
#    x y z
# 1: A 1 2
# 2: A 2 3
# 3: B 1 3
# 4: B 2 4
# 5: B 1 3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM