简体   繁体   中英

R function to label one table with ID from another table?

This is a simple question - but I think I'm probably not including key words in google to find the right answer, so I am very sorry about that.

Basically I have one excel document with about 10000 gene names for some Brassica plants I had sequenced (in random order) and another document with the same (and more) gene names (ordered) but with the Arabidopsis gene they correspond to in the column next to it.

So for example:

File 1:

  1. BnAxyz
  2. BnAklm
  3. BnAdef
  4. Etc...

File 2:

  1. BnAabc AtAxyz
  2. BnAdef AtAypi
  3. BnAghi AtApqr

Essentially, I want to annotate my sequenced Brassica genes (file 1) with their correct Arabidopsis label (second column of file 2) without reordering file 1 (so just adding a column to file 1 but so that each gene corresponds to its correct name).

I have tried to merge the lists on R but that doesn't work. Does anyone know how I could attempt this in R?

Thank you very much for any help.

It would really help if you could post the R code you used so far. In absence of that, we can only guess which types of data structures you're actually dealing with.

Anyways, your problem should be solved in a straightforward manner using tidyverse .

Here's a rough draft:

library(tidyverse)

df_bras <- read_csv(
  "brassica_genes.csv", 
  col_names = c("gene_bras"), 
  col_types = "c")
df_arab <- read_csv(
  "arabidopsis_genes.csv", 
  col_name = c("gene_bras", "gene_arab"), 
  col_types = "cc")

df <- df_bras %>% left_join(df_arab, by = c("gene_bras"))

The resulting data frame df would contain all Brassica genes, and the Arabidopsis gene name (if it is present in df_arab ) or NA .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM