R function to label one table with ID from another table?

Question

This is a simple question - but I think I'm probably not including key words in google to find the right answer, so I am very sorry about that.

Basically I have one excel document with about 10000 gene names for some Brassica plants I had sequenced (in random order) and another document with the same (and more) gene names (ordered) but with the Arabidopsis gene they correspond to in the column next to it.

So for example:

File 1:

BnAxyz
BnAklm
BnAdef
Etc...

File 2:

BnAabc AtAxyz
BnAdef AtAypi
BnAghi AtApqr

Essentially, I want to annotate my sequenced Brassica genes (file 1) with their correct Arabidopsis label (second column of file 2) without reordering file 1 (so just adding a column to file 1 but so that each gene corresponds to its correct name).

I have tried to merge the lists on R but that doesn't work. Does anyone know how I could attempt this in R?

Thank you very much for any help.

Answer 1

It would really help if you could post the R code you used so far. In absence of that, we can only guess which types of data structures you're actually dealing with.

Anyways, your problem should be solved in a straightforward manner using tidyverse .

Here's a rough draft:

library(tidyverse)

df_bras <- read_csv(
  "brassica_genes.csv", 
  col_names = c("gene_bras"), 
  col_types = "c")
df_arab <- read_csv(
  "arabidopsis_genes.csv", 
  col_name = c("gene_bras", "gene_arab"), 
  col_types = "cc")

df <- df_bras %>% left_join(df_arab, by = c("gene_bras"))

The resulting data frame df would contain all Brassica genes, and the Arabidopsis gene name (if it is present in df_arab ) or NA .

R function to label one table with ID from another table?

Question

1 answers

solution1
1 2020-04-20 16:22:16

R function to label one table with ID from another table?

Question

1 answers

solution1 1 2020-04-20 16:22:16

solution1
1 2020-04-20 16:22:16