[英]R, change rownames in df1 based on column in df2 (matching names)
I have two dataframes, one that has gene names and their counts, and a second dataframe that has the gene names and their ontological names.我有两个数据框,一个具有基因名称及其计数,第二个 dataframe 具有基因名称及其本体名称。 I want to update the gene names from df1
with the names they associate to in df2
.我想用它们在df2
中关联的名称来更新df1
中的基因名称。
Sample data:样本数据:
df1 <- data.frame(ID=c("gene1","gene2","gene3"), sample1=c(1,0,50), sample2=c(0,0,0), sample3=c(45,56,11))
rownames(df1) <- df1$ID
df1$ID <- NULL
> df1
sample1 sample2 sample3
gene1 1 0 45
gene2 0 0 56
gene4 50 0 11
df2 <- data.frame(ID=c("gene1","gene2","gene3", "gene4"), name=c("hr1","gene2","exoc like exoc1 in drosophila", "ftp"), desc=c("protein","unknown","fake immunity known for fighting viruses", "like ftp1"))
rownames(df2) <- df2$ID
df2$ID <- NULL
> df2
name desc
gene1 hr1 protein
gene2 gene2 unknown
gene3 exoc like exoc1 in drosophila fake immunity known for fighting viruses
gene4 ftp like ftp1
What I want is for df1
row names to update using the names in "name" in df2
.我想要的是使用df2
中“名称”中的名称更新df1
行名称。 df2
contains all the gene names and their ontological names in the first column; df2
包含第一列中的所有基因名称及其本体名称; some of those genes are missing in df1
.其中一些基因在df1
中缺失。
Expected output:预期 output:
> df1.new
sample1 sample2 sample3
hr1 1 0 45
gene2 0 0 56
ftp 50 0 11
I'm not familiar with tidyverse to try and update names and the problem I am having is the way my dataframes are loaded, is I am trying to update index names.我不熟悉 tidyverse 尝试更新名称,我遇到的问题是我的数据框的加载方式,我正在尝试更新索引名称。 I've tried manipulating my dataframes using the only similar question I could find ( R - replace specific values in df with values from other df by matching row names ) but I am trying to update index row names.我尝试使用我能找到的唯一类似问题( R - 通过匹配行名将 df 中的特定值替换为其他 df 中的值)来操作我的数据框,但我正在尝试更新索引行名。
I've tried variations of:我尝试了以下变化:
df1 <- df1[na.omit(match(rownames(df1), df2$name)),] # throws an error
library(dplyr)
library(tibble)
rownames_to_column(df1) %>% rows_update(df2 %>% rownames_to_column(df1), by ="rowname") %>% column_to_rownames(df1) # Error, Names repair functions must return a character vector
Having trouble because it's an index I want to match and update with a column in a second data frame.遇到麻烦,因为它是我想与第二个数据框中的列匹配和更新的索引。
Another one (btw, your code does not match the dataframes):另一个(顺便说一句,您的代码与数据框不匹配):
> map = df2$name
> names(map) = rownames(df2)
> df1.new = df1
> rownames(df1.new) = map[rownames(df1)]
> df1.new
sample1 sample2 sample3
hr1 1 0 45
gene2 0 0 56
exoc 50 0 11
The code you have to create df1
and df2
does not match the df1
and df2
that you show, but here is a way to get the result
column I think you want--you can then remove any columns you don't want.您必须创建df1
和df2
的代码与您显示的df1
和df2
不匹配,但这是一种获取我认为您想要的result
列的方法——然后您可以删除任何您不想要的列。
library(dplyr)
library(tibble)
library(tidyr)
df1 %>%
rownames_to_column(var = "gene") %>%
left_join(
df2 %>% rownames_to_column(var = "gene"),
by = "gene"
) %>%
mutate(result = ifelse(desc == "unknown", gene, desc))
# gene sample1 sample2 sample3 name desc
# 1 gene1 1 0 45 hr1 protein
# 2 gene2 0 0 56 Unknown origin unknown
# 3 gene3 50 0 11 exoc like exoc1 in drosophila fake immunity known for fighting viruses
# result
# 1 protein
# 2 gene2
# 3 fake immunity known for fighting viruses
Here is a slightly modified version of @Gregor Thomas answer:这是@Gregor Thomas 答案的略微修改版本:
library(tibble)
library(dplyr)
left_join(df1 %>%
rownames_to_column("gene"),
df2 %>%
rownames_to_column("gene"),
by="gene") %>%
column_to_rownames("name") %>%
select(starts_with("sample"))
sample1 sample2 sample3
hr1 1 0 45
gene2 0 0 56
ftp 50 0 11
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.