I have a large matrix called data
of 10,864 rows and 134 columns.
The first 4 columns are parameters which make every row unique. The data from 5th to 134th column for all rows are numbers between 1 and 20.
I am running a for loop in the matrix to insert NA
into certain cells of the matrix. This needs to be done on the basis of unique values from Columns OrgID
, rank
and scorei
only if value in same row for column score(i+12) != 1
.
Hence, I run a for loop
from column 5 to 134 and where there is duplication based on these three columns and value in score(i+12)
column value is not equal to 1, I insert NA
into that cell of matrix.
for(i in 5:ncol(data){
data[which(duplicated(data[,c(1,4,i)]) & (data[,i+12])!=1),i] <- "NA"
}
This code, however, gives the wrong output by inserting NA
only where there is duplicated value on the basis of 1
st, 4
th and i
th column ie equivalent result to running the following code:
for(i in 5:ncol(data){
data[which(duplicated(data[,c(1,4,i)])),i] <- "NA"
}
How do make it to perform the required operation only when value in column score(i+12) !=1
in the duplicated rows.
To make it simpler to see the failed output, I have highlighted a few rows and the relevant columns to show how this works when applied to the column 118 ie i =118
here.
For example, based on the above explained logic, there is duplication in OrgID=5659
. The duplication based on OrgID, rank and score118 identifies these 2 rows with one row showing value in score130=1
and other score130=16
. Hence, in the row with score130=16
should be now NA
according to the logic. But this remains unchanged at 16
.
Maybe you can try
for(i in 5:(ncol(data) - 12)) {
inds <- duplicated(data[c(1,4,i)]) | duplicated(data[c(1,4,i)], fromLast = TRUE)
data[inds & data[[i + 12]] != 1, i + 12] <- NA
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.