根据通用值合并同一数据框中的行

Question

Most of the approaches I've come across involve using dplyr to apply a function when combining features, however, I would just like to restructure a single data frame without applying any function to each group. 我遇到的大多数方法都涉及在组合功能时使用dplyr来应用功能，但是，我只想重构单个数据帧而不对每个组应用任何功能。

I have a single data frame that looks like this: 我有一个看起来像这样的数据框：

gene_name  chr  nb_pos    nb_ref  nb_alt  m_pos    m_ref  m_alt
ACAA1       3   38173733    C      T     38144875     G     T 
ACAA1       3   38144875    G      T     38144876     G     A

I would like to combine each row with a common gene_name and chr , where each gene can have a variable amount of rows, to look like this: 我想将每行与一个通用的gene_name和chr结合起来，其中每个基因可以具有可变数量的行，如下所示：

gene_name  chr   np_pos1   nb_ref1   nb_alt1  nb_pos2  nb_ref2  nb_alt2  nb_alt2
ACAA1       3   38173733      C         T     38144875    G       T         T

Does anyone know of a way to do this? 有人知道这样做的方法吗？

Answer 1

We can use dcast from the devel version of data.table ie v1.9.5 . 我们可以使用dcast从devel版本data.table即v1.9.5 。 Instructions to install it are here . 安装说明在here 。

Create a sequence column ('ind') based on the grouping columns ('gene_name', 'chr'), and then use dcast specifying the value.var columns. 根据分组列（“ gene_name”，“ chr”）创建序列列（“ ind”），然后使用dcast指定value.var列。

library(data.table)
dcast(setDT(df1)[, ind:= 1:.N ,.(gene_name, chr)], 
                gene_name+chr~ind, value.var=names(df1)[3:8])
#  gene_name chr 1_nb_pos 2_nb_pos 1_nb_ref 2_nb_ref 1_nb_alt 2_nb_alt  1_m_pos
#1:   ACAA1   3 38173733 38144875        C        G     TRUE     TRUE 38144875
#   2_m_pos 1_m_ref 2_m_ref 1_m_alt 2_m_alt
#1: 38144876       G       G       T       A

Or using reshape from base R after we create the sequence column using ave . 或在使用ave创建序列列之后，使用base R reshape 。

 df2 <- transform(df1, ind=ave(seq_along(gene_name),
           gene_name, chr, FUN=seq_along))
  reshape(df2, idvar=c('gene_name', 'chr'), timevar='ind',
     direction='wide')
 # gene_name chr nb_pos.1 nb_ref.1 nb_alt.1  m_pos.1 m_ref.1 m_alt.1 nb_pos.2
 #1     ACAA1   3 38173733        C     TRUE 38144875       G       T 38144875
 #  nb_ref.2 nb_alt.2  m_pos.2 m_ref.2 m_alt.2
 #1        G     TRUE 38144876       G       A

data 数据

df1 <- structure(list(gene_name = c("ACAA1", "ACAA1"), chr = c(3L, 3L
), nb_pos = c(38173733L, 38144875L), nb_ref = c("C", "G"), 
nb_alt =   c(TRUE, 
TRUE), m_pos = 38144875:38144876, m_ref = c("G", "G"), m_alt = c("T", 
"A")), .Names = c("gene_name", "chr", "nb_pos", "nb_ref", "nb_alt", 
"m_pos", "m_ref", "m_alt"), class = "data.frame", 
 row.names = c(NA, -2L))

根据通用值合并同一数据框中的行

问题描述

1 个解决方案

解决方案1
4 已采纳 2015-05-05 05:44:16

data 数据

根据通用值合并同一数据框中的行

问题描述

1 个解决方案

解决方案1 4 已采纳 2015-05-05 05:44:16

data 数据

解决方案1
4 已采纳 2015-05-05 05:44:16