My data
Type1 Type2 Type3 Expected_Output
Red Orange Pink Pink
Green abc na abc
Blue na na Blue
white na Green Green
na Brown purple purple
na black na black
grey na na grey
How to Achieve the following expected output:
A data.frame which consists of multiple columns of the same data type is an indication that the data might be reshaped from wide to long format. Therefore, melt()
is used to get rid of the NA
s and join to add the new column to the original data.frame:
library(data.table)
DT[melt(DT[, rn := .I], id.vars = "rn", na.rm = TRUE)[
order(variable), .(New = last(value)), by = rn], on = .(rn)][, rn := NULL][]
Type1 Type2 Type3 Expected_Output New 1: Red Orange Pink Pink Pink 2: Green abc NA abc abc 3: Blue NA NA Blue Blue 4: white NA Green Green Green 5: NA Brown purple purple grey 6: NA black NA black purple 7: grey NA NA grey black
fread()
is used to read the sample data set. The na.strings
parameter tells fread()
to convert the "na"
strings to NA
:
library(data.table)
DT <- fread(
"Type1 Type2 Type3 Expected_Output
Red Orange Pink Pink
Green abc na abc
Blue na na Blue
white na Green Green
na Brown purple purple
na black na black
grey na na grey ",
na.strings = "na")
The OP has requested that also lines which are fully NA
should appear in the output. This can be achieved by changing the order of data.table
objects in the right join. In data.table
syntax, X[Y]
is a right join which takes all rows of Y
. If all rows of X
are required, the right join Y[X]
has to be used
library(data.table)
# new data with 8th row
DT <- fread(
"Type1 Type2 Type3 Expected_Output
Red Orange Pink Pink
Green abc na abc
Blue na na Blue
white na Green Green
na Brown purple purple
na black na black
grey na na grey
na na na na",
na.strings = "na")
melt(DT[, rn := .I], id.vars = "rn", na.rm = TRUE)[
order(variable), .(New = last(value)), by = rn][DT, on = .(rn)][, rn := NULL][]
New Type1 Type2 Type3 Expected_Output 1: Pink Red Orange Pink Pink 2: abc Green abc NA abc 3: Blue Blue NA NA Blue 4: Green white NA Green Green 5: purple NA Brown purple purple 6: black NA black NA black 7: grey grey NA NA grey 8: NA NA NA NA NA
You can do that with last
from dplyr
inside an apply
call. Make sure your na
are NA
so that na.omit
ignores them.
library(dplyr)
df[df=="na"] <- NA #change "na" to NA
df$expected2 <-apply(df[,1:3],1,function(x) last(na.omit(x)))
Type1 Type2 Type3 Expected_Output expected2
1 Red Orange Pink Pink Pink
2 Green abc <NA> abc abc
3 Blue <NA> <NA> Blue Blue
4 white <NA> Green Green Green
5 <NA> Brown purple purple purple
6 <NA> black <NA> black black
7 grey <NA> <NA> grey grey
data
df <- read.table(text="Type1 Type2 Type3 Expected_Output
Red Orange Pink Pink
Green abc na abc
Blue na na Blue
white na Green Green
na Brown purple purple
na black na black
grey na na grey ",header=TRUE,stringsAsFactors=FALSE)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.