简体   繁体   中英

Using tidyverse to loop all rows and identify (and keep) only the higher value

I'm working with people from the psychology field, and factor analysis is a typical procedure within this area. I have a dataset like the following one:

原始数据集

I want to preserve only the highest value in each row while transforming all other values in missing cases

新数据集

I am aware dplyr can solve this problem easily, but I could not find a simple code example to perform that.

Please, check the code below to reproduce this question:

library(tidyverse)
set.seed(123)
ds <- data.frame(x1 = runif(10,min = .1,.29),x2 = runif(10,min = .1,.35), x3 = runif(10,min = .1,.38))
ds <- ds %>% mutate_all(funs(round(.,3)))

ds 

Please, keep in mind this question can help other people with the same (or similar) problems. I searched before asking and I found just one close topic here

Thanks much.

A very quick answer would be:

Use the pmax (base) function for row-wise maximum and then mutate_all with the if_else statement to keep or set to missing

ds %>% 
  #find the row-wise maximum and store it as a column temporarily
  mutate (max = pmax(x1,x2,x3)) %>% 
  #loop through all columns and do a check whether the value equals the max
  #If Yes, then leave as is, if not then set to NA
  mutate_all( funs(if_else(. == max,max,NA_real_))) %>% 
  #remove the temporary `max` column
  select(-max)

      x1    x2    x3
1     NA    NA 0.349
2     NA    NA 0.294
3     NA    NA 0.279
4     NA    NA 0.378
5     NA    NA 0.284
6     NA 0.325    NA
7     NA    NA 0.252
8  0.270    NA    NA
9  0.205    NA    NA
10    NA 0.339    NA

As this place is so supportive , I decided to answer my own question after reading the @Lefkios-Paikousis answer. In real life, when conducting a Factor Analysis, we have positive results as well as negative ones and we need to maintain the highest value considering its sign . As an example, -0.4 is higher than 0.2 and the first value should be kept.

The following code I built to perform what I want. I hope it helps other people with parallel questions.

  library(tidyverse)
  set.seed(123)
  ds <- data.frame(x1 = runif(10,min = 0.1,0.29),x2 = runif(10,min = 0.1,0.35), x3 = runif(10,min = 0.1,.38))
  ds <- ds %>% mutate_all(funs(round(.,3))) #round
  ds <- ds %>% mutate(x1 = x1*-1) #transform into negative



  ds <- ds %>% 
    rowwise() %>% #each row
     mutate(Max.Len = pmax(x1,x2,x3)) %>%  #create a var to the highest value
     mutate(Min.Len = pmin(x1,x2,x3)) %>%  #create a var to the lowests value
     mutate(keep = if_else(abs(Max.Len)>abs(Min.Len),Max.Len,Min.Len)) %>% #create a var to point out the highest value considering the sign
     mutate_all(funs(if_else(. == keep, keep, NA_real_))) %>%  #keep only the highest value mainteining the sign
     select(-c(Max.Len, Min.Len, keep)) #supress other variables

原始数据集

转换后的数据集

Thanks

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM