简体   繁体   English

R,根据 df2 中的列更改 df1 中的行名(匹配名称)

[英]R, change rownames in df1 based on column in df2 (matching names)

I have two dataframes, one that has gene names and their counts, and a second dataframe that has the gene names and their ontological names.我有两个数据框,一个具有基因名称及其计数,第二个 dataframe 具有基因名称及其本体名称。 I want to update the gene names from df1 with the names they associate to in df2 .我想用它们在df2中关联的名称来更新df1中的基因名称。

Sample data:样本数据:

df1 <- data.frame(ID=c("gene1","gene2","gene3"), sample1=c(1,0,50), sample2=c(0,0,0), sample3=c(45,56,11))
rownames(df1) <- df1$ID
df1$ID <- NULL

> df1
      sample1 sample2 sample3
gene1       1       0      45
gene2       0       0      56
gene4      50       0      11

df2 <- data.frame(ID=c("gene1","gene2","gene3", "gene4"), name=c("hr1","gene2","exoc like exoc1 in drosophila", "ftp"), desc=c("protein","unknown","fake immunity known for fighting viruses", "like ftp1"))
rownames(df2) <- df2$ID
df2$ID <- NULL

> df2
          name      desc
gene1     hr1       protein
gene2     gene2     unknown
gene3     exoc      like exoc1 in drosophila fake immunity known for fighting viruses
gene4     ftp       like ftp1

What I want is for df1 row names to update using the names in "name" in df2 .我想要的是使用df2中“名称”中的名称更新df1行名称。 df2 contains all the gene names and their ontological names in the first column; df2包含第一列中的所有基因名称及其本体名称; some of those genes are missing in df1 .其中一些基因在df1中缺失。

Expected output:预期 output:

> df1.new
      sample1 sample2 sample3
hr1       1       0      45
gene2     0       0      56
ftp      50       0      11

I'm not familiar with tidyverse to try and update names and the problem I am having is the way my dataframes are loaded, is I am trying to update index names.我不熟悉 tidyverse 尝试更新名称,我遇到的问题是我的数据框的加载方式,我正在尝试更新索引名称。 I've tried manipulating my dataframes using the only similar question I could find ( R - replace specific values in df with values from other df by matching row names ) but I am trying to update index row names.我尝试使用我能找到的唯一类似问题( R - 通过匹配行名将 df 中的特定值替换为其他 df 中的值)来操作我的数据框,但我正在尝试更新索引行名。

I've tried variations of:我尝试了以下变化:

df1 <- df1[na.omit(match(rownames(df1), df2$name)),] # throws an error

library(dplyr)
library(tibble)
rownames_to_column(df1) %>% rows_update(df2 %>% rownames_to_column(df1), by ="rowname") %>% column_to_rownames(df1) # Error, Names repair functions must return a character vector

Having trouble because it's an index I want to match and update with a column in a second data frame.遇到麻烦,因为它是我想与第二个数据框中的列匹配和更新的索引。

Another one (btw, your code does not match the dataframes):另一个(顺便说一句,您的代码与数据框不匹配):

> map = df2$name
> names(map) = rownames(df2)
> df1.new = df1
> rownames(df1.new) = map[rownames(df1)]
> df1.new
      sample1 sample2 sample3
hr1         1       0      45
gene2       0       0      56
exoc       50       0      11

The code you have to create df1 and df2 does not match the df1 and df2 that you show, but here is a way to get the result column I think you want--you can then remove any columns you don't want.您必须创建df1df2的代码与您显示的df1df2不匹配,但这是一种获取我认为您想要的result列的方法——然后您可以删除任何您不想要的列。

library(dplyr)
library(tibble)
library(tidyr)
df1 %>%
  rownames_to_column(var = "gene") %>%
  left_join(
    df2 %>% rownames_to_column(var = "gene"),
    by = "gene"
  ) %>%
  mutate(result = ifelse(desc == "unknown", gene, desc))
#    gene sample1 sample2 sample3                          name                                     desc
# 1 gene1       1       0      45                           hr1                                  protein
# 2 gene2       0       0      56                Unknown origin                                  unknown
# 3 gene3      50       0      11 exoc like exoc1 in drosophila fake immunity known for fighting viruses
#                                     result
# 1                                  protein
# 2                                    gene2
# 3 fake immunity known for fighting viruses

Here is a slightly modified version of @Gregor Thomas answer:这是@Gregor Thomas 答案的略微修改版本:

library(tibble)
library(dplyr)

left_join(df1 %>% 
            rownames_to_column("gene"), 
          df2 %>% 
            rownames_to_column("gene"), 
          by="gene") %>% 
  column_to_rownames("name") %>% 
  select(starts_with("sample"))
      sample1 sample2 sample3
hr1         1       0      45
gene2       0       0      56
ftp        50       0      11

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM