Using dplyr to mutate factor variable conditional on numeric variables

Question

I have data from a sports team tracking wins/losses versus other teams with the following structure:

Game TotalWins TotalLosses Team1Win Team1Loss Team2Win Team2Loss ...
1    1         0           1        NA        NA       NA
2    1         1           NA       NA        NA       1
3    2         1           NA       NA        1        NA
4    2         2           NA       1         NA       NA
5    3         2           NA       NA        1        NA
...

I want to create a factor variable that includes the team the game was played against so that the data looks like this:

Game TotalWins TotalLosses Team1Win Team1Loss Team2Win Team2Loss Team
1    1         0           1        NA        NA       NA        Team1
2    1         1           NA       NA        NA       1         Team2
3    2         1           NA       NA        1        NA        Team2
4    2         2           NA       1         NA       NA        Team1
5    3         2           NA       NA        1        NA        Team2
...

My thinking (NOT workable code) is essentially this:

if (Team1Win == 1 | Team1Loss == 1), Team = "Team1"
if (Team2Win == 1 | Team2Loss == 1), Team = "Team2"

I'm really struggling with how to do this in dplyr using mutate. I've tried various approaches with ifelse, recode, etc., but I either keep getting errors or results that are not what I want.

What is correct and most efficient way to make this work in dplyr?

Answer 1

Along similar lines to other answers, but with some useful changes:

na.rm = TRUE in gather
sub works just fine, no need for stringr
includes the full data as per the goal, using a full_join .

library(dplyr)
library(tidyr)

df = read.delim(text = 
"Game TotalWins TotalLosses Team1Win Team1Loss Team2Win Team2Loss
1 1 0 1 NA NA NA
2 1 1 NA NA NA 1
3 2 1 NA NA 1 NA
4 2 2 NA 1 NA NA
5 3 2 NA NA 1 NA", sep = " ")

df %>% 
  select(-starts_with("Total")) %>% 
  gather(Team, one, -Game, na.rm = TRUE) %>% 
  select(-one) %>% 
  mutate(Team = sub("Win|Loss", "", Team)) %>% 
  full_join(df, .)

#> Joining, by = "Game"
#>   Game TotalWins TotalLosses Team1Win Team1Loss Team2Win Team2Loss  Team
#> 1    1         1           0        1        NA       NA        NA Team1
#> 2    2         1           1       NA        NA       NA         1 Team2
#> 3    3         2           1       NA        NA        1        NA Team2
#> 4    4         2           2       NA         1       NA        NA Team1
#> 5    5         3           2       NA        NA        1        NA Team2

Answer 2

I'm a sucker for dplyr approaches to things at the moment so I offer a solution using dplyr that extends to however many teams you might have. It does also use tidyr and stringr as helpfully pointed out in the comment by apom below.

library(dplyr)
library(tidyr)
library(stringr)

df = read_delim(
"Game TotalWins TotalLosses Team1Win Team1Loss Team2Win Team2Loss
1 1 0 1 NA NA NA
2 1 1 NA NA NA 1
3 2 1 NA NA 1 NA
4 2 2 NA 1 NA NA
5 3 2 NA NA 1 NA",delim = " ")

df %>% 
  gather("Team",value,contains("Team")) %>% 
  filter(!is.na(value)) %>%
  mutate(Team = str_replace_all(Team,c("Win" = "","Loss" = ""))) %>%
  select(-value)

Answer 3

this could be what you are looking for. ( not hardcoded for just 2 teams)

# solution 1 :
paste0("Team",ceiling(apply(df[-c(1:3)], 1, function(x) which(!is.na(x)))/2))
[1] "Team1" "Team2" "Team2" "Team1" "Team2"

# solution 2: using a apply() {basically a for loop itself]
apply(df[-c(1:3)], 1, function(x) gsub("(Team\\d)+.*", "\\1", colnames(df[-c(1:3)])[which(!is.na(x))]))
# [1] "Team1" "Team2" "Team2" "Team1" "Team2"

# solution 3: (long route to dplyr) [ you have indirectly taught me a lot in dplyr through my search for this solution]
func <- function(x){
  y = which(x == 1)             # get the location of where 1 appears
  z = rep(0, times = length(x)) # create a vector of 0's+location of 1
  z[y] = y                      # i.e. c(0,0,3,0,5) for Team2Win 
  z
}

df1 = df[-c(1:3)] %>% gather("key", "value", starts_with("Team")) %>% 
                group_by(key) %>%
                dplyr::mutate(x = func(value)) %>%
                filter(x != 0) %>% arrange(x) %>% select(key)

df$newcol = gsub("(Team\\d+).*", "\\1", df1$key)

Answer 4

You can do it using a simple loop:

x = colnames(df)
df$team<- NA

for (i in 1:nrow(df))
{
 df$team[i] = x[which(df[i,]==1)]
}

And then at the end you can trim "win" and "loss" using following function:

df$team<- gsub("win", "",df$team)
df$team<- gsub("loss", "",df$team)

Answer 5

I'm pretty sure that you have more than two teams in your data, and team names are not generic. What you want to do is to reshape the data to long form first and extract the relevant team name. So you may want to proceed as follows.

library(dplyr)
library(tidyr)
new_df <- df %>% 
  gather(team,idx,Team1Win:Team100Loss) %>%
  filter(!is.na(idx)) %>%
  select(-idx) %>%
  mutate(team = gsub("Win|Loss","",team))

If you want to keep those wide columns, then you can join the new DF to the old one.

Using dplyr to mutate factor variable conditional on numeric variables

Question

5 answers

solution1
2 ACCPTED 2017-02-02 21:36:37

solution2
1 2017-02-02 20:11:27

solution3
0 2017-02-02 20:01:11

solution4
0 2017-02-02 20:05:36

solution5
0 2017-02-02 20:28:08

Using dplyr to mutate factor variable conditional on numeric variables

Question

5 answers

solution1 2 ACCPTED 2017-02-02 21:36:37

solution2 1 2017-02-02 20:11:27

solution3 0 2017-02-02 20:01:11

solution4 0 2017-02-02 20:05:36

solution5 0 2017-02-02 20:28:08

solution1
2 ACCPTED 2017-02-02 21:36:37

solution2
1 2017-02-02 20:11:27

solution3
0 2017-02-02 20:01:11

solution4
0 2017-02-02 20:05:36

solution5
0 2017-02-02 20:28:08