简体   繁体   中英

Creating new column based on the other columns blank values in R Program

My data

Type1   Type2   Type3       Expected_Output                 
Red     Orange  Pink        Pink                
Green   abc     na          abc             
Blue     na     na          Blue                
white    na     Green       Green               
na     Brown    purple      purple              
na     black     na         black               
grey    na       na         grey                

How to Achieve the following expected output:

  1. If type3 is filled than expected output should be type3's content
  2. If type2 is filled and type3 is blank(na) than expected output should be type2's content
  3. If type3 and type2 is blank(na) than expcetd output should be type1's content

A data.frame which consists of multiple columns of the same data type is an indication that the data might be reshaped from wide to long format. Therefore, melt() is used to get rid of the NA s and join to add the new column to the original data.frame:

library(data.table)
DT[melt(DT[, rn := .I], id.vars = "rn", na.rm = TRUE)[
    order(variable), .(New = last(value)), by = rn], on = .(rn)][, rn := NULL][]
 Type1 Type2 Type3 Expected_Output New 1: Red Orange Pink Pink Pink 2: Green abc NA abc abc 3: Blue NA NA Blue Blue 4: white NA Green Green Green 5: NA Brown purple purple grey 6: NA black NA black purple 7: grey NA NA grey black

Data

fread() is used to read the sample data set. The na.strings parameter tells fread() to convert the "na" strings to NA :

library(data.table)
DT <- fread(
"Type1   Type2   Type3       Expected_Output                 
Red     Orange  Pink        Pink                
Green   abc     na          abc             
Blue     na     na          Blue                
white    na     Green       Green               
na     Brown    purple      purple              
na     black     na         black               
grey    na       na         grey    ",
na.strings = "na")

Edit

The OP has requested that also lines which are fully NA should appear in the output. This can be achieved by changing the order of data.table objects in the right join. In data.table syntax, X[Y] is a right join which takes all rows of Y . If all rows of X are required, the right join Y[X] has to be used

library(data.table)
# new data with 8th row
DT <- fread(
  "Type1   Type2   Type3       Expected_Output                 
  Red     Orange  Pink        Pink                
  Green   abc     na          abc             
  Blue     na     na          Blue                
  white    na     Green       Green               
  na     Brown    purple      purple              
  na     black     na         black               
  grey    na       na         grey    
  na      na       na         na",
  na.strings = "na")

melt(DT[, rn := .I], id.vars = "rn", na.rm = TRUE)[
  order(variable), .(New = last(value)), by = rn][DT, on = .(rn)][, rn := NULL][]
 New Type1 Type2 Type3 Expected_Output 1: Pink Red Orange Pink Pink 2: abc Green abc NA abc 3: Blue Blue NA NA Blue 4: Green white NA Green Green 5: purple NA Brown purple purple 6: black NA black NA black 7: grey grey NA NA grey 8: NA NA NA NA NA

You can do that with last from dplyr inside an apply call. Make sure your na are NA so that na.omit ignores them.

library(dplyr)
df[df=="na"] <- NA #change "na" to NA
df$expected2 <-apply(df[,1:3],1,function(x) last(na.omit(x)))

  Type1  Type2  Type3 Expected_Output expected2
1   Red Orange   Pink            Pink      Pink
2 Green    abc   <NA>             abc       abc
3  Blue   <NA>   <NA>            Blue      Blue
4 white   <NA>  Green           Green     Green
5  <NA>  Brown purple          purple    purple
6  <NA>  black   <NA>           black     black
7  grey   <NA>   <NA>            grey      grey

data

df <- read.table(text="Type1   Type2   Type3 Expected_Output                 
Red     Orange  Pink        Pink                
                 Green   abc     na          abc             
                 Blue     na     na          Blue                
                 white    na     Green       Green               
                 na     Brown    purple      purple              
                 na     black     na         black               
                 grey    na       na         grey ",header=TRUE,stringsAsFactors=FALSE)

Here is another simple solution based on ifelse; hope it helps.

DT$new <- ifelse(is.na(DT$Type3), ifelse(is.na(DT$Type2),DT$Type1,DT$Type2),DT$Type3)

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM