[英]Using tidyverse to loop all rows and identify (and keep) only the higher value
I'm working with people from the psychology field, and factor analysis is a typical procedure within this area. 我正在与心理学领域的人一起工作,因素分析是该领域的典型程序。 I have a dataset like the following one: 我有一个像下面这样的数据集:
I want to preserve only the highest value in each row while transforming all other values in missing cases 我只想保留每行中的最高值,而在丢失情况下转换所有其他值
I am aware dplyr can solve this problem easily, but I could not find a simple code example to perform that. 我知道dplyr可以轻松解决此问题,但是我找不到简单的代码示例来执行此操作。
Please, check the code below to reproduce this question: 请检查下面的代码以重现此问题:
library(tidyverse)
set.seed(123)
ds <- data.frame(x1 = runif(10,min = .1,.29),x2 = runif(10,min = .1,.35), x3 = runif(10,min = .1,.38))
ds <- ds %>% mutate_all(funs(round(.,3)))
ds
Please, keep in mind this question can help other people with the same (or similar) problems. 请记住,这个问题可以帮助其他有相同(或相似)问题的人。 I searched before asking and I found just one close topic here 我在询问之前进行了搜索,但在这里只找到一个比较接近的话题
Thanks much. 非常感谢。
A very quick answer would be: 一个非常快速的答案是:
Use the pmax
(base) function for row-wise maximum and then mutate_all
with the if_else
statement to keep or set to missing 使用pmax
(基地)函数进行行最大,然后mutate_all
与if_else
声明保留或设置为失踪
ds %>%
#find the row-wise maximum and store it as a column temporarily
mutate (max = pmax(x1,x2,x3)) %>%
#loop through all columns and do a check whether the value equals the max
#If Yes, then leave as is, if not then set to NA
mutate_all( funs(if_else(. == max,max,NA_real_))) %>%
#remove the temporary `max` column
select(-max)
x1 x2 x3
1 NA NA 0.349
2 NA NA 0.294
3 NA NA 0.279
4 NA NA 0.378
5 NA NA 0.284
6 NA 0.325 NA
7 NA NA 0.252
8 0.270 NA NA
9 0.205 NA NA
10 NA 0.339 NA
As this place is so supportive , I decided to answer my own question after reading the @Lefkios-Paikousis answer. 由于这个地方非常支持我, 因此在阅读@ Lefkios-Paikousis答案后,我决定回答我自己的问题。 In real life, when conducting a Factor Analysis, we have positive results as well as negative ones and we need to maintain the highest value considering its sign . 在现实生活中,进行因子分析时,我们得到的都是正面的结果,也有负面的结果,我们需要考虑其符号来保持最高的价值。 As an example, -0.4 is higher than 0.2 and the first value should be kept. 例如,-0.4大于0.2,则应保留第一个值。
The following code I built to perform what I want. 我构建的以下代码执行我想要的。 I hope it helps other people with parallel questions. 我希望它可以帮助其他有类似问题的人。
library(tidyverse)
set.seed(123)
ds <- data.frame(x1 = runif(10,min = 0.1,0.29),x2 = runif(10,min = 0.1,0.35), x3 = runif(10,min = 0.1,.38))
ds <- ds %>% mutate_all(funs(round(.,3))) #round
ds <- ds %>% mutate(x1 = x1*-1) #transform into negative
ds <- ds %>%
rowwise() %>% #each row
mutate(Max.Len = pmax(x1,x2,x3)) %>% #create a var to the highest value
mutate(Min.Len = pmin(x1,x2,x3)) %>% #create a var to the lowests value
mutate(keep = if_else(abs(Max.Len)>abs(Min.Len),Max.Len,Min.Len)) %>% #create a var to point out the highest value considering the sign
mutate_all(funs(if_else(. == keep, keep, NA_real_))) %>% #keep only the highest value mainteining the sign
select(-c(Max.Len, Min.Len, keep)) #supress other variables
Thanks 谢谢
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.