简体   繁体   English

如何从具有 3 列的数据框在 R 中创建数组?

[英]How Do I Create an Array in R from a Data Frame with 3 Columns?

I currently have a dataframe with three columns, as recreated below:我目前有一个包含三列的 dataframe,如下所示:

SNPname单核苷酸多态性 AnimalID动物ID AlleleFrequency等位基因频率
ARS-BFGL-BAC-10172 ARS-BFGL-BAC-10172 1 1 0.0 0.0
ARS-BFGL-BAC-1020 ARS-BFGL-BAC-1020 2 2 0.5 0.5
ARS-BFGL-BAC-10345 ARS-BFGL-BAC-10345 3 3 1.0 1.0
ARS-BFGL-BAC-10591 ARS-BFGL-BAC-10591 4 4 0.5 0.5
and so on...等等... and so on...等等... and so on...等等...

For each animal, I have ~777,000 SNPs and their corresponding allele frequencies.对于每只动物,我有大约 777,000 个 SNP 及其相应的等位基因频率。 (To be exact, I have 777,962 SNPs on 52 animals for a total of 40,454,024 observations). (准确地说,我对 52 只动物有 777,962 个 SNP,总共 40,454,024 次观察)。

Basically I need to create an array of this data so that my rows are the SNPs, the column is the allele frequency, and the 3rd dimension of the array is the animalID.基本上我需要创建一个包含这些数据的数组,以便我的行是 SNP,列是等位基因频率,数组的第三维是动物 ID。 So in total, I need my dimensions to be [777962 1 52].所以总的来说,我需要我的尺寸是[777962 1 52]。 However, for the life of me, I cannot figure out how to make this array.但是,对于我的生活,我无法弄清楚如何制作这个数组。 I've tried the array command and the abind command, among a few other things out of desperation but I have not had any luck.我已经尝试了 array 命令和 abind 命令,以及其他一些出于绝望的事情,但我没有任何运气。

This is the code that was originally suggested to me by a friend who knows more about R than I do:这是一个比我更了解R的朋友最初向我建议的代码:

array = abind(df, along = 3)

but that gives me an array with these dimensions: [40454024 2 1] which isn't right.但这给了我一个具有这些维度的数组: [40454024 2 1] 这是不对的。

Here are some other things I've tried that haven't worked:以下是我尝试过的其他一些无效的方法:

array = array(data = df$`SNPname`, df$AlleleFrequency, df$`AnimalID`)
array = abind(data = df$`SNPname`, df$AlleleFrequency, df$`AnimalID`)
array = array(c(df$`SNPname`, df$AlleleFrequency), dim =c(df$`SNPname`, df$AlleleFrequency, df$`AnimalID`))

If someone could help point me in the right direction, I would be eternally grateful.如果有人能帮助我指出正确的方向,我将永远感激不尽。 Thanks in advance!!提前致谢!!

If you mean you need a 3d array with the three columns as dimensions, this means each cell/value will be a count.如果您的意思是您需要一个以三列为维度的 3d 数组,这意味着每个单元格/值都是一个计数。 For this, use xtabs (or table ):为此,请使用xtabs (或table ):

xtabs(~SNPname + AlleleFrequency + AnimalID, data = dat)
# , , AnimalID = 1
#                     AlleleFrequency
# SNPname              0 0.5 1
#   ARS-BFGL-BAC-10172 1   0 0
#   ARS-BFGL-BAC-1020  0   0 0
#   ARS-BFGL-BAC-10345 0   0 0
#   ARS-BFGL-BAC-10591 0   0 0
# , , AnimalID = 2
#                     AlleleFrequency
# SNPname              0 0.5 1
#   ARS-BFGL-BAC-10172 0   0 0
#   ARS-BFGL-BAC-1020  0   1 0
#   ARS-BFGL-BAC-10345 0   0 0
#   ARS-BFGL-BAC-10591 0   0 0
# , , AnimalID = 3
#                     AlleleFrequency
# SNPname              0 0.5 1
#   ARS-BFGL-BAC-10172 0   0 0
#   ARS-BFGL-BAC-1020  0   0 0
#   ARS-BFGL-BAC-10345 0   0 1
#   ARS-BFGL-BAC-10591 0   0 0
# , , AnimalID = 4
#                     AlleleFrequency
# SNPname              0 0.5 1
#   ARS-BFGL-BAC-10172 0   0 0
#   ARS-BFGL-BAC-1020  0   0 0
#   ARS-BFGL-BAC-10345 0   0 0
#   ARS-BFGL-BAC-10591 0   1 0

If you mean that you need the frequency to be the value in each cell and not the count, then while you can create a 3d array for it, it will never have more than 2d of data.如果您的意思是您需要频率是每个单元格中的值而不是计数,那么虽然您可以为其创建一个 3d 数组,但它永远不会有超过 2d 的数据。 One way to get this is with tidyr::pivot_wider :一种方法是使用tidyr::pivot_wider

tidyr::pivot_wider(dat, "SNPname", names_from = "AnimalID", values_from = "AlleleFrequency")
# # A tibble: 4 x 5
#   SNPname              `1`   `2`   `3`   `4`
#   <chr>              <dbl> <dbl> <dbl> <dbl>
# 1 ARS-BFGL-BAC-10172     0  NA      NA  NA  
# 2 ARS-BFGL-BAC-1020     NA   0.5    NA  NA  
# 3 ARS-BFGL-BAC-10345    NA  NA       1  NA  
# 4 ARS-BFGL-BAC-10591    NA  NA      NA   0.5

Data数据

dat <- structure(list(SNPname = c("ARS-BFGL-BAC-10172", "ARS-BFGL-BAC-1020", "ARS-BFGL-BAC-10345", "ARS-BFGL-BAC-10591"), AnimalID = 1:4,     AlleleFrequency = c(0, 0.5, 1, 0.5)), class = "data.frame", row.names = c(NA, -4L))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从 R 中的数据框中删除重复的列? - How do I remove duplicated columns from a data frame in R? 如何用R中的csv数据创建带有列的帧数据结构? - how to create the frame data structure with columns from csv data in R? 如何加密R中数据框中的特定变量/列? - How do I encrypt specific variables/columns in a data frame in R? 如何在R中编辑数据框(多列)? - How do I edit my data frame (multiply columns) in R? 如何将一列添加到 R 中使用多列和多行信息的数据框? - How do I add a column to a data frame in R that uses information from multiple columns and rows? 在 R 中,如何将一个数据框中选定行的值与另一个数据框中选定的列匹配? - In R, how do I match values from selected rows in one data frame with selected columns in another? 如何根据日期范围在数据框中创建列? - How do I create columns in a data frame based on a date range? 如何将数据框创建到具有两列的边缘列表中? - How do I create a data frame into an edgelist with two columns? 如何从一组 .csv 文件中的列中提取平均值并使用结果创建一个新的数据框? - How do I extract the mean from columns in a set of .csv files and create a new data frame with the results? 如何在 R 中创建带有内部列表和向量的数据框? - How do I create a data frame with internal lists and vectors in R?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM