简体   繁体   中英

R code, creating number of new columns based on conditions

I'd really appreciate anyone's help in approaching this problem. It's a lot to ask, so any and all advice is appreciated!

Here is a sample set that @dardisco created when I was asking a similar question a month ago. a and b represent two different tests, and the number (09, 10, 11) represents the year the test was done.

Ultimately, I want to figure out

  1. Number of Positive B tests / Number of Positive A tests, for 2010 and for 2011

  2. Number of Positive B tests / Total Number of B tests, for 2010 and for 2011

a few things that I have to check first:

  • If an A test was done in both 2009 and 2010, I would take the result from 2010. This is also true for the B test
  • I want to remove any case where someone had a B test without having an A test first. If it's in the same year that's ok. There shouldn't be any of those...but I want to know how to check that.

If anyone has ANY advice, I'd appreciate it! If you just want to address part of the problem (either what I ultimately want to figure out, or the checks I want to make first), that'd be great. I'm not sure if I should use nested ifelse statements, or if something else would be better...

If you need any more info, let me know!

vals1 <- c(NA, "pos", "neg", "nr")
set.seed(1)
df1 <- data.frame(
   id = seq(1:10),
  a09 = sample(vals1,10,replace=TRUE),
  a10 = sample(vals1,10,replace=TRUE),
  a11 = sample(vals1,10,replace=TRUE),
  b10 = sample(vals1,10,replace=TRUE),
  b11 = sample(vals1,10,replace=TRUE)
    )

### modify to give at least one case meeting each of your criteria
df1[10,c(5,6)] <- NA # 2x NAs for b's
df1[1,c(2,3,4)] <- NA # 3x NAs for a's
df1[2,c(2,4,5,6)] <- NA # all NAs

ok for the 1st question: If i understand it right you want to consider just the most recent A and b tests. right?

# a by default the 2011-a
df1$a=df1$a11
# if currently not defined set a to 2010-a
df1[ is.na(df1$a), "a"] = df1[ is.na(df1$a), "a10"] 

# b by default 2011 b
df1$b=df1$b11
# if not defined yet set b to 2010-b
df1[ is.na(df1$b), "b"] = df1[ is.na(df1$b), "b10"] #set just those a's to 10 not defined in a11

# set all b's to NA where a is NA
df1[is.na(df1$a), "b"] = NA

# number of positive a's
num.pos.a = nrow(df1[ !is.na(df1$a) & df1$a=="pos",])
# number of positive b's
num.pos.b = nrow(df1[ !is.na(df1$b) & df1$b=="pos",])

is that what you wanted?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM