简体   繁体   中英

How to create and populate a new column (Z) with another column's data (Y) but only if a certain observation in a third column (X) is present?

I'm somewhat new-ish with R, and have scoured this site for help but I cannot yet find exactly what I am looking for. Let's assume I have the following (this example is not-exhaustive as my actual df has around 2,000+ rows):

 ID      "A"        "B"
  1   11-18-17      1
  2   10-10-10      1
  3   07-02-96      1
  4   01-13-20      2
  5   02-01-98      2
  6   03-04-64      1
  7   11-13-84      1
  8   11-07-20      2

Where column A is full of dates, and column B is just numbers (this is not a dummy variable). I want to make a column C that is populated with the dates from column A only if there is a 2 present in column B. So, it would look like this:

ID      "A"       "B"     "C"
 1   11-18-17      1      
 2   10-10-10      1
 3   07-02-96      1
 4   01-13-20      2    01-13-20
 5   02-01-98      2    02-01-98
 6   03-04-64      1
 7   11-13-84      1
 8   11-07-20      2    11-07-20

I would prefer that if the condition of B=2 is not reached, that it shows up as blank in column C. Can anyone please provide me some help with this? I would greatly appreciate it!

Simply use vectorized ifelse to assign values conditionally by other columns:

Data

txt <- 'ID      "A"        "B"
  1   "11-18-17"      1
  2   "10-10-10"      1
  3   "07-02-96"      1
  4   "01-13-20"      2
  5   "02-01-98"      2
  6   "03-04-64"      1
  7   "11-13-84"      1
  8   "11-07-20"      2'

df <- read.table(text=txt, header=TRUE)
df

# ID           A B
#  1  1 11-18-17 1
#  2  2 10-10-10 1
#  3  3 07-02-96 1
#  4  4 01-13-20 2
#  5  5 02-01-98 2
#  6  6 03-04-64 1
#  7  7 11-13-84 1
#  8  8 11-07-20 2

Solution

df$C <- with(df, ifelse(B==2, as.character(A), NA_character_))
df

# ID          A  B        C
#  1  1 11-18-17 1     <NA>
#  2  2 10-10-10 1     <NA>
#  3  3 07-02-96 1     <NA>
#  4  4 01-13-20 2 01-13-20
#  5  5 02-01-98 2 02-01-98
#  6  6 03-04-64 1     <NA>
#  7  7 11-13-84 1     <NA>
#  8  8 11-07-20 2 11-07-20

This should work with the exception of NA rather than blank in column 'c' when column 'b' equals 1. Blank is a character ( class("") returns "character") and R can't have a character and a date in the same column. Hope this helps.

library(dplyr)
b = c(1,1,1,2,2,1,1,2)
a = rep("12-13-20", 8)
df <- data.frame(a, b) %>%
 mutate(a = as.Date(a, "%m-%d-%y")) %>%
 mutate(c = if_else(b == 2, a, NULL))
print(df)

           a b          c
1 2020-12-13 1       <NA>
2 2020-12-13 1       <NA>
3 2020-12-13 1       <NA>
4 2020-12-13 2 2020-12-13
5 2020-12-13 2 2020-12-13
6 2020-12-13 1       <NA>
7 2020-12-13 1       <NA>
8 2020-12-13 2 2020-12-13

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM