[英]Join single column values to multiple column names and expand dataframe
I am creating a single summary table from multiple files. 我正在从多个文件创建一个汇总表。 I have imported data from 4 files, file1...file4, and done some merging/manipulation using the
reshape2
package, so my data looks like this: 我已经从4个文件file1 ... file4中导入了数据,并使用
reshape2
包进行了一些合并/操作,所以我的数据如下所示:
chr.list positions sample ref alt depth freq sum min.prop
chr1 12428 file4 C a 52 2 14 0.2857143
chr1 12428 file4 C a 52 2 14 0.2857143
chr1 12428 file3 C c 52 1 18 NA
chr1 12428 file3 C g 52 2 4 0.5000000
chr1 12428 file1 C g 52 2 4 0.5000000
chr1 12428 file2 C t 52 2 16 0.1875000
Now, I want to separate the data for each of the four files but keep it in the same dataframe. 现在,我想将四个文件中每个文件的数据分开,但将其保存在同一数据框中。 I want to keep the
chr.list
, positions
, ref
and alt
columns intact but want to remove the column sample
, merge that column values with columns depth
, freq
, sum
and min.prop
and cast the data such that it looks like: 我想保持
chr.list
, positions
, ref
和alt
列不变,但要删除列sample
,将该列值与depth
, freq
, sum
和min.prop
列合并,然后将数据转换为如下所示:
chr.list positions ref alt file1.depth file1.freq file1.sum file1.min.prop file2.depth file2.freq file2.sum file2.min.prop file3.depth file3.freq file3.sum file3.min.prop
chr1 12428 C a NA NA NA NA NA NA NA NA NA NA NA NA
chr1 12428 C c NA NA NA NA NA NA NA NA 52 1 18 NA
chr1 12428 C g 52 2 4 0.5 NA NA NA NA 52 2 4 0.5
chr1 12428 C t NA NA NA NA 52 2 16 0.18 NA NA NA NA
How can I do it? 我该怎么做? I am guessing using dcast but I am not sure.
我正在猜测使用dcast,但不确定。
Thanks! 谢谢!
The reshaping is straight-forward: 重塑很简单:
dd <- read.table(header = TRUE, stringsAsFactors = FALSE,
text = "chr.list positions sample ref alt depth freq sum min.prop
chr1 12428 file4 C a 52 2 14 0.2857143
chr1 12428 file4 C a 52 2 14 0.2857143
chr1 12428 file3 C c 52 1 18 NA
chr1 12428 file3 C g 52 2 4 0.5000000
chr1 12428 file1 C g 52 2 4 0.5000000
chr1 12428 file2 C t 52 2 16 0.1875000")
n <- names(dd)
rr <- reshape(dd[!duplicated(dd$sample), ], direction = 'wide', sep = '~',
idvar = n[c(1:2,4:5)], v.names = n[6:9], timevar = n[3])
# chr.list positions ref alt depth~file4 freq~file4 sum~file4 min.prop~file4
# 1 chr1 12428 C a 52 2 14 0.2857143
# 3 chr1 12428 C c NA NA NA NA
# 5 chr1 12428 C g NA NA NA NA
# 6 chr1 12428 C t NA NA NA NA
# depth~file3 freq~file3 sum~file3 min.prop~file3 depth~file1 freq~file1
# 1 NA NA NA NA NA NA
# 3 52 1 18 NA NA NA
# 5 NA NA NA NA 52 2
# 6 NA NA NA NA NA NA
# sum~file1 min.prop~file1 depth~file2 freq~file2 sum~file2 min.prop~file2
# 1 NA NA NA NA NA NA
# 3 NA NA NA NA NA NA
# 5 4 0.5 NA NA NA NA
# 6 NA NA 52 2 16 0.1875
The order and column names isn't a reshape
problem, so you need to do that yourself: 顺序和列名不是
reshape
问题,因此您需要自己执行此操作:
Find the variables you cast with the ~
, split by the tilde, reverse, and collapse back to the string. 找到用
~
强制转换的变量,并用波浪号分割,反向并折叠回字符串。 Then reorder the columns somehow 然后以某种方式重新排列列
idx <- grepl('~', names(rr))
names(rr)[idx] <- sapply(strsplit(names(rr)[idx], '~'),
function(x) paste0(rev(x), collapse = '_'))
rr[, c(1:4, order(names(rr)[-(1:4)]) + 4)]
# chr.list positions ref alt file1_depth file1_freq file1_min.prop file1_sum
# 1 chr1 12428 C a NA NA NA NA
# 3 chr1 12428 C c NA NA NA NA
# 5 chr1 12428 C g 52 2 0.5 4
# 6 chr1 12428 C t NA NA NA NA
# file2_depth file2_freq file2_min.prop file2_sum file3_depth file3_freq
# 1 NA NA NA NA NA NA
# 3 NA NA NA NA 52 1
# 5 NA NA NA NA NA NA
# 6 52 2 0.1875 16 NA NA
# file3_min.prop file3_sum file4_depth file4_freq file4_min.prop file4_sum
# 1 NA NA 52 2 0.2857143 14
# 3 NA 18 NA NA NA NA
# 5 NA NA NA NA NA NA
# 6 NA NA NA NA NA NA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.