I have data like this:
id diag1 diag2 diag3 diag4 diag5 diag5 diag 6 diag7 diag8 diag9
26 V3000 75261 V053 V290
23 V3001 75261 V053
24 V3000 75537 75567 V053
19 V3001 7503 7613 7746 7631 7560 V290 76529 V1819
29 V3001 77989 7470 7852 V053
31 V3000 75261 79415 77989 V053
33 V3000 7700 75329 7705 7750 7706 77089 7746 7661 75251
20 V3000 7530 7795 76529 V053 V183
17 V3000 75329 7788 V053
22 4659 7862 7455 V7285
21 V3000 7503 77181 7579 7560 75251
30 V3000 7470 V053
27 V3000 76519 7470 7726 7746 76719 76528 V053 V502
I like to add var d1-d40 whose values are based on:
if from diag1 to diag9 have '75261' then d1 =1 else d1 = 0
if from diag1 to diag9 have '7700' then d2 =1 else d2 = 0
if from diag1 to diag9 have '7613' '75329' then d3 =1 else d3 = 0
if from diag1 to diag9 have '7470', '7746' then d4 = 1 e;se d4 = 0 etc
I used codes like this
bd$d40 = 0
for (i in ncol(bd){
if (bd[,i] %in% ('75261')) {bd[,'d40'] = 1}
}
But they were not working. Thanks.
It sounds to me like you're trying to determine if a given row contains a particular id. You would do this with the apply() function:
d1 <- apply(bd, 1, function(x) as.numeric("75261" %in% x))
d2 <- apply(bd, 1, function(x) as.numeric("7700" %in% x))
...
I can still remember the aha-moment when I realized that SAS expressions all had an implicit for-loop that would be run (only within the current dataset) when they were executed. R code can be built that does the same thing, but requires an explicit range of rows to get the vectorization working properly as well as proper assignment to the particular target set among all the items in the workspace.
This would might get one of your for-loops working properly:
bd$d40 = 0
for (i in 2:10 ) {
bd$d40 <- ifelse ( bd[,i] %in% '75261', 1, bd$d40)
}
You really don't want to say for(i in ncol(bd) )
because the number of columns is growing. And you really need to use the column oriented function ifelse
rather than if
. "If" in R is really two different constructs, whereas in SAS and SPSS it is a column oriented construct for which the R analog is ifelse
rather than if
. Also notice that I did not overwrite previous 1
-values with the ifelse
(except that my first posting did.)
R also encourages you to write functions that operate on data objects. In your case you want to apply a test to a block of columns and get a row-oriented answer so you could encapsulate that action with an analog of the R function pmax
that returns a column-oriented max value (although reading that again I guess you could also say it was row-oriented, anyway it is good for blocks of columns because it does its calculations by-row.):
pany <- function(df, items)) { # edited to allow match for > 1 item
apply(df, 1, function(row) length(intersect( row , items)) >= 1 )}
pany(bd[,2:10], '75261')
[1] TRUE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
[12] FALSE FALSE
bd$d40 <- as.numeric(pany(bd[,2:10], '75261'))
bd$d40
[1] 1 1 0 0 0 1 0 0 0 0 0 0 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.