简体   繁体   中英

Comparing elements from different columns but from the same data frame with R

I am trying to determine sequence similarity. I would like to create a function to compare df elements, for the following example:

   V1  V2 V3  V4
1  C   D  A   D  
2  A   A  S   E 
3  V   T  T   V
4  A   T  S   S 
5  C   D  R   Y 
6  C   A  D   V
7  V   T  E   T 
8  A   T  A   A
9  R   V  V   W
10 W   R  D   D
  

I want to compare the first element from the first column with a first element from the second column. If it matches == 1, else 0. Then the second element from the first column compared with the second element from the second column. and so on.

For example:

C != D -----0
A == A -----1

That way I would like to compare column 1 with column 2 then column 3 and column 4. Then column 2 compare with column 3 and column 4. Then column 3 with column 4.

The output would be just the numbers:

0
1
0
0
0
0
0
0
0
0

I tried the following but it doesn't work:

compared_df <- ifelse(df_trial$V1==df_trial$V2,1,ifelse(df_trial$V1==df_trial$V2,0,NA))
compared_df

As suggested, I tried the following:

compared_df1 <- df_trial$matches <- as.integer(df_trial$V1 == df_trial$V2)

This works well for small sample comparison. Is there a way to compare more globally? Like for the updated columns.

As @Ronak Shah said in the comment using the following is sufficent in the case you want to compare 2 values:

df$matches <- as.integer(df$V1 == df$V2)

Another option which is applicable to more the 2 rows as well is to use apply to check for the number of unique elements in a row in the following way:

df$matches = apply(df, 1, function(x) as.integer(length(unique(x)) == 1))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM