简体   繁体   中英

dplyr mutate with function call returning incorrect value

Can someone explain why the following dplyr mutate call, in which I apply a function taking one column as an argument to set the value of a new column, doesn't work? It doesn't seem to be calling the function on the correct value: the new season column is set according to the first value in the mon column instead of the value in its own row.

# Function to return season (winter, summer, or transition) given numerical month
getSeason <- function(m) {
  if(m >= 11 || m <= 3) 
    return(as.factor("Winter"))
  if(m >= 5 && m <= 9) 
    return(as.factor("Summer"))
  return(as.factor("Trans"))
}

getSeason(5) # Works: returns "Summer"

mon <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
months <- as.data.frame(mon)

months %>% mutate(season=getSeason(mon))  # doesn't work: all seasons set as "Winter"

I am using R version 3.2.4 and the latest development version of dplyr. (This wasn't working in the latest release of dplyr, either.)

The other answers nicely explained why you were having the problem.

I think this is a situation where the new function case_when could come in handy (currently available in the development version, dplyr_0.4.3.9001 ).

At the moment you have to use dollar sign notation to use case_when inside mutate .

months %>% mutate(season = case_when(.$mon >= 11 | .$mon <= 3 ~ "Winter",
                                     .$mon >= 5 & .$mon <= 9 ~ "Summer",
                                     TRUE ~ "Trans"))

   mon season
1    1 Winter
2    2 Winter
3    3 Winter
4    4  Trans
5    5 Summer
6    6 Summer
7    7 Summer
8    8 Summer
9    9 Summer
10  10  Trans
11  11 Winter
12  12 Winter

You can build your function using case_when instead of if or ifelse (or the new dplyr function if_else ). To me the syntax seems more similar to using if than having to nest with ifelse .

getSeason <- function(m) {
    factor(
        case_when(
            m >= 11 | m <= 3 ~ "Winter",
            m >= 5 & m <= 9 ~ "Summer",
            TRUE ~ "Trans"
            ) 
        )
}

months %>% mutate(season=getSeason(mon))

   mon season
1    1 Winter
2    2 Winter
3    3 Winter
4    4  Trans
5    5 Summer
6    6 Summer
7    7 Summer
8    8 Summer
9    9 Summer
10  10  Trans
11  11 Winter
12  12 Winter

Note that the "everything else" condition is done last in case_when , and you just need to put TRUE on the left hand side of the formula to fill in everything else with the final value.

You could also use Vectorize :

# Function to return season (winter, summer, or transition) given numerical month
getSeason <- function(m) {
  if(m >= 11 || m <= 3) 
    return(as.factor("Winter"))
  if(m >= 5 && m <= 9) 
    return(as.factor("Summer"))
  return(as.factor("Trans"))
}


getSeason <- Vectorize(getSeason)

mon <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
months <- data.frame(mon = mon)

months %>% mutate(season=gs(mon)) 

if isn't vectorized (weirdly), so it's only using the first value in mon , ie 1 , so you're getting all Winter .

To avoid this, use ifelse , which is vectorized:

months %>% mutate(season = factor(ifelse(mon >= 11 | mon <=3, 
                                         'Winter', ifelse(mon >= 5 & mon <= 9, 
                                                          'Summer', 'Trans'))))
#    mon season
# 1    1 Winter
# 2    2 Winter
# 3    3 Winter
# 4    4  Trans
# 5    5 Summer
# 6    6 Summer
# 7    7 Summer
# 8    8 Summer
# 9    9 Summer
# 10  10  Trans
# 11  11 Winter
# 12  12 Winter

If you want to add enough levels that nesting ifelse s gets nasty, use cut instead, as you're really turning continuous numeric data into factor data, which is the purpose of cut .

months %>% mutate(season = droplevels(cut(months$mon, c(0, 3, 4, 9, 10, 12), 
                                          c('Winter', 'Trans', 'Summer', 'Trans', 'Winter'))))

Note droplevels to clean up duplicate levels in this case, which will raise warnings.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM