This is my data.frame/data.table
library(data.table)
df<- fread('
A B C D SecondLargest
1 3 2 4 B
6 3 5 4 C
7 3 7 1 A
6 9 3 2 A
')
I am trying to extract the column name with second largest value(in parallel) in my "SecondLargest" desried column.
I have tried with no success.
df[,SecondLargest:= colnames(df[,c(1:4),
with=FALSE])[apply(df[,c(1:4),with=FALSE],1,function(x)
x[rank(1/x, ties.method='first')==2])]]
I am ok with both first or last type of tie breaker. You help is highly appreciated!
We can try with melt
. Here, I subset the dataset to remove the expected output, created a row number column 'rn', melt
to 'long' format', order
the 'value' in descending after grouping by 'rn', select the second element of 'variable' and assign ( :=
) it as "SecondLargest" column in 'df'.
df[, SecondLargest := melt(df[, -5, with = FALSE][, rn := 1:.N], id.var= 'rn')[,
variable[order(-value)][2] , rn]$V1]
df
# A B C D SecondLargest
#1: 1 3 2 4 B
#2: 6 3 5 4 C
#3: 7 3 7 1 C
#4: 6 9 3 2 A
If we need the first tie,
melt(df[, -5, with = FALSE][, rn := 1:.N], id.var= 'rn')[,
variable[order(-value[!duplicated(value, fromLast=TRUE)])][2] , rn]$V1
#[1] B C A A
We could also do:
df$SecondLargest <- colnames(df)[apply(df,1,function(x)which(x==sort(x,partial=3)[3])[1])]
# A B C D SecondLargest
# 1 1 3 2 4 B
# 2 6 3 5 4 C
# 3 7 3 7 1 A
# 4 6 9 3 2 A
partial
argument of sort
is handful since it does not sort all elements but rather sorts partially based on what we need. 3 = ncol(df)-1
which means second largest.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.