It seems like this should be an easy task with apply, but I still can't figure it out. I have data like this:
x1= c(1,1,2,3,1,2,4)
x2= c(1,2,2,6,2,3,1)
x3= c(1,1,1,0,0,0,0)
x4= c(1,0,0,0,0,3,1)
df=data.frame( x1,x2,x3,x4)
df
x1 x2 x3 x4
1 1 1 1 1
2 1 2 1 0
3 2 2 1 0
4 3 6 0 0
5 1 2 0 0
6 2 3 0 3
7 4 1 0 1
And a vector like this:
m= c(1,1,0,0)
rbind(df,m)
df=rbind(df,m)
df
x1 x2 x3 x4
1 1 1 1 1
2 1 2 1 0
3 2 2 1 0
4 3 6 0 0
5 1 2 0 0
6 2 3 0 3
7 4 1 0 1
8 1 1 0 0
Now I'd like for all the values in a column that are equal to the value on the last row (the m vector) in the same column to be changed to 0 and others to 1. For example df[1,2] is 1 which is the same as m[2] and so the value for df2[1,2] is 0. The new data set would then look like this:
df2
x1 x2 x3 x4
1 0 0 1 1
2 0 1 1 0
3 1 1 1 0
4 1 1 0 0
5 0 1 0 0
6 1 1 0 1
7 1 0 0 1
8 1 1 0 0
Using the 'df' dataset after the rbind
, we do the comparison between all rows except the last one ( df[-8,]
) and the last row that get replicated so that the lengths are the same. ( df[8,][col(df[-8,])]
). This will return a logical matrix, which can be coerced back to binary by wrapping with +
. Then we rbind the binary output with the last row of 'df' ( df[8,]
) to get the expected output.
df2 <- rbind(+(df[-8,]!=df[8,][col(df[-8,])]), df[8,])
df2
# x1 x2 x3 x4
#1 0 0 1 1
#2 0 1 1 0
#3 1 1 1 0
#4 1 1 0 0
#5 0 1 0 0
#6 1 1 0 1
#7 1 0 0 1
#8 1 1 0 0
Or as @DavidArenburg mentioned, this would be made more compact by comparing 'df' before the rbind
step with the vector
('m').
m1 <- rbind(+(df != m[col(df)]), m)
row.names(m1) <- NULL
Just to understand it better, we replicate the 'm' vector using the col
function, which returns numeric column index of the 'df'
col(df)
# [,1] [,2] [,3] [,4]
#[1,] 1 2 3 4
#[2,] 1 2 3 4
#[3,] 1 2 3 4
#[4,] 1 2 3 4
#[5,] 1 2 3 4
#[6,] 1 2 3 4
#[7,] 1 2 3 4
using
m[col(df)]
#[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
the first element in 'm' ie 1 gets replicated 7 times, followed by the second element 1 with 7 times, and so on...
Now, the lengths are the same
length( m[col(df)])
#[1] 28
prod(dim(df))
#[1] 28
to have an element-by-element comparison.
df != m[col(df)]
# x1 x2 x3 x4
#[1,] FALSE FALSE TRUE TRUE
#[2,] FALSE TRUE TRUE FALSE
#[3,] TRUE TRUE TRUE FALSE
#[4,] TRUE TRUE FALSE FALSE
#[5,] FALSE TRUE FALSE FALSE
#[6,] TRUE TRUE FALSE TRUE
#[7,] TRUE FALSE FALSE TRUE
In the last step, we coerce this to binary and rbind to 'm'.
Another option would be using the sweep
with MARGIN=2
rbind(+(sweep(df, 2 ,m ,'!=')), m)
You could try the following:
df2 <- t(t(df) != m) * 1 # create a logical dataframe that compares rows with m
# and transpose result back to original format,
# coerce TRUE and FALSE entries into numerical values by multiplying with 1
df2[nrow(df2),] <- m #keep the last row unchanged
#> df2
# x1 x2 x3 x4
#1 0 0 1 1
#2 0 1 1 0
#3 1 1 1 0
#4 1 1 0 0
#5 0 1 0 0
#6 1 1 0 1
#7 1 0 0 1
#8 1 1 0 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.