Comparing two columns in a dataframe using R or Excel

Question

I have a csv file containing two columns, "Taxon" in column A and "Tip" in column C. I would like to compare column A against column C, and if the string matches another string in column C I'd like it to print "y" or something similar in column B next to the string in column A, if not I would like to print "n" or equivalent. Here is the beginning of my data:

Taxon                                   B     Tip
Nitrosotalea devanaterra                     Methanothermobacter thermautotrophicus
Nitrososphaera gargensis                     Methanobacterium beijingense
Nitrososphaera sca5445                       Methanobacterium bryantii
Nitrososphaera sca2170                       Methanosarcina mazei
Methanobacterium beijingense                 Persephonella marina
Methanobacterium bryantii                    Sulfurihydrogenibium azorense
Methanothermobacter thermautotrophicus       Balnearium lithotrophicum
Methanosarcina mazei                         Isosphaera pallida
Koribacter versatilis                        Methanobacterium beijingense
Acidicapsa borealis                          Parachlamydia acanthamoebae
Acidobacterium capsulatum                    Leptospira biflexa

This is only a small part of the data, but the idea is that "n" would be printed in column B for all of the bacteria apart from "Methanobacterium beijingense" and "Methanobacterium bryantii", which are also found in the "Tip" column, and so "y" would be posted there. These could also just be "1" and "0".

I know dplyr has some good functions for filtering and joining data, however I can't find anything that exactly matches my needs. If there is an alternative method of using Excel to do this that's fine too.

Thanks.

Answer 1

For excel use the following formula in B2,

=if(isnumber(match(a2, c:c, 0)), "y", "n")

Fill down or double-click the 'drag button'.

Answer 2

A method using r and dplyr :

# create example data 
x = read.table(header = TRUE, stringsAsFactors = FALSE, text = 
"Taxon                                   B     Tip
Nitrosotalea_devanaterra                   1  Methanothermobacter_thermautotrophicus
Nitrososphaera_gargensis                   1  Methanobacterium_beijingense
Nitrososphaera_sca5445                     1  Methanobacterium_bryantii
Nitrososphaera_sca2170                     1  Methanosarcina_mazei
Methanobacterium_beijingense               1  Persephonella_marina
Methanobacterium_bryantii                  1  Sulfurihydrogenibium_azorense
Methanothermobacter_thermautotrophicus     1  Balnearium_lithotrophicum
Methanosarcina_mazei                       1  Isosphaera_pallida
Koribacter_versatilis                      1  Methanobacterium_beijingense
Acidicapsa_borealis                        1  Parachlamydia_acanthamoebae
Acidobacterium_capsulatum                  1  Leptospira_biflexa")

# Data management part
x1 = data.frame(A = x$Taxon,B = x$B)
x2 = data.frame(A = x$Tip,B = x$B)

x$B[which(x$Taxon == anti_join(x1,x2))] = 0

Comparing two columns in a dataframe using R or Excel

Question

2 answers

solution1
2 ACCPTED

solution2
0 2018-02-19 12:56:32

Comparing two columns in a dataframe using R or Excel

Question

2 answers

solution1 2 ACCPTED

solution2 0 2018-02-19 12:56:32

solution1
2 ACCPTED

solution2
0 2018-02-19 12:56:32