简体   繁体   中英

add columns and their values basedon other several columns

I have data like this:

id      diag1   diag2   diag3  diag4    diag5   diag5 diag 6     diag7  diag8    diag9

26  V3000   75261   V053    V290                                
23  V3001   75261   V053                                    
24  V3000   75537   75567   V053                                
19  V3001   7503    7613    7746    7631    7560    V290    76529   V1819           
29  V3001   77989   7470    7852    V053                            
31  V3000   75261   79415   77989   V053                            
33  V3000   7700    75329   7705    7750    7706    77089   7746    7661    75251     
20  V3000   7530    7795    76529   V053    V183                        
17  V3000   75329   7788    V053                                
22  4659    7862    7455    V7285                               
21  V3000   7503    77181   7579    7560    75251                       
30  V3000   7470    V053                                    
27  V3000   76519   7470    7726    7746    76719   76528   V053    V502    

I like to add var d1-d40 whose values are based on:

if from diag1 to diag9 have '75261' then d1 =1 else d1 = 0

if from diag1 to diag9 have '7700' then d2 =1 else d2 = 0

if from diag1 to diag9 have '7613' '75329' then d3 =1 else d3 = 0

if from diag1 to diag9 have '7470', '7746' then d4 = 1 e;se d4 = 0 etc

I used codes like this

 bd$d40 = 0
 for (i in ncol(bd){
   if (bd[,i]  %in% ('75261')) {bd[,'d40'] = 1}
}

But they were not working. Thanks.

It sounds to me like you're trying to determine if a given row contains a particular id. You would do this with the apply() function:

d1 <- apply(bd, 1, function(x) as.numeric("75261" %in% x))
d2 <- apply(bd, 1, function(x) as.numeric("7700" %in% x))
...

I can still remember the aha-moment when I realized that SAS expressions all had an implicit for-loop that would be run (only within the current dataset) when they were executed. R code can be built that does the same thing, but requires an explicit range of rows to get the vectorization working properly as well as proper assignment to the particular target set among all the items in the workspace.

This would might get one of your for-loops working properly:

bd$d40 = 0
for (i in 2:10 ) {
    bd$d40 <- ifelse ( bd[,i]  %in% '75261',  1, bd$d40) 
}

You really don't want to say for(i in ncol(bd) ) because the number of columns is growing. And you really need to use the column oriented function ifelse rather than if . "If" in R is really two different constructs, whereas in SAS and SPSS it is a column oriented construct for which the R analog is ifelse rather than if . Also notice that I did not overwrite previous 1 -values with the ifelse (except that my first posting did.)

R also encourages you to write functions that operate on data objects. In your case you want to apply a test to a block of columns and get a row-oriented answer so you could encapsulate that action with an analog of the R function pmax that returns a column-oriented max value (although reading that again I guess you could also say it was row-oriented, anyway it is good for blocks of columns because it does its calculations by-row.):

  pany <- function(df, items)) {   # edited to allow match for > 1 item
                   apply(df, 1, function(row) length(intersect( row , items)) >= 1 )}
  pany(bd[,2:10], '75261')
 [1]  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
 [12] FALSE FALSE
bd$d40 <- as.numeric(pany(bd[,2:10], '75261'))
bd$d40
 [1] 1 1 0 0 0 1 0 0 0 0 0 0 0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM