I'm working with people from the psychology field, and factor analysis is a typical procedure within this area. I have a dataset like the following one:
I want to preserve only the highest value in each row while transforming all other values in missing cases
I am aware dplyr can solve this problem easily, but I could not find a simple code example to perform that.
Please, check the code below to reproduce this question:
library(tidyverse)
set.seed(123)
ds <- data.frame(x1 = runif(10,min = .1,.29),x2 = runif(10,min = .1,.35), x3 = runif(10,min = .1,.38))
ds <- ds %>% mutate_all(funs(round(.,3)))
ds
Please, keep in mind this question can help other people with the same (or similar) problems. I searched before asking and I found just one close topic here
Thanks much.
A very quick answer would be:
Use the pmax
(base) function for row-wise maximum and then mutate_all
with the if_else
statement to keep or set to missing
ds %>%
#find the row-wise maximum and store it as a column temporarily
mutate (max = pmax(x1,x2,x3)) %>%
#loop through all columns and do a check whether the value equals the max
#If Yes, then leave as is, if not then set to NA
mutate_all( funs(if_else(. == max,max,NA_real_))) %>%
#remove the temporary `max` column
select(-max)
x1 x2 x3
1 NA NA 0.349
2 NA NA 0.294
3 NA NA 0.279
4 NA NA 0.378
5 NA NA 0.284
6 NA 0.325 NA
7 NA NA 0.252
8 0.270 NA NA
9 0.205 NA NA
10 NA 0.339 NA
As this place is so supportive , I decided to answer my own question after reading the @Lefkios-Paikousis answer. In real life, when conducting a Factor Analysis, we have positive results as well as negative ones and we need to maintain the highest value considering its sign . As an example, -0.4 is higher than 0.2 and the first value should be kept.
The following code I built to perform what I want. I hope it helps other people with parallel questions.
library(tidyverse)
set.seed(123)
ds <- data.frame(x1 = runif(10,min = 0.1,0.29),x2 = runif(10,min = 0.1,0.35), x3 = runif(10,min = 0.1,.38))
ds <- ds %>% mutate_all(funs(round(.,3))) #round
ds <- ds %>% mutate(x1 = x1*-1) #transform into negative
ds <- ds %>%
rowwise() %>% #each row
mutate(Max.Len = pmax(x1,x2,x3)) %>% #create a var to the highest value
mutate(Min.Len = pmin(x1,x2,x3)) %>% #create a var to the lowests value
mutate(keep = if_else(abs(Max.Len)>abs(Min.Len),Max.Len,Min.Len)) %>% #create a var to point out the highest value considering the sign
mutate_all(funs(if_else(. == keep, keep, NA_real_))) %>% #keep only the highest value mainteining the sign
select(-c(Max.Len, Min.Len, keep)) #supress other variables
Thanks
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.