[英]How Do I Create an Array in R from a Data Frame with 3 Columns?
I currently have a dataframe with three columns, as recreated below:我目前有一个包含三列的 dataframe,如下所示:
SNPname![]() |
AnimalID![]() |
AlleleFrequency![]() |
---|---|---|
ARS-BFGL-BAC-10172 ![]() |
1 ![]() |
0.0 ![]() |
ARS-BFGL-BAC-1020 ![]() |
2 ![]() |
0.5 ![]() |
ARS-BFGL-BAC-10345 ![]() |
3 ![]() |
1.0 ![]() |
ARS-BFGL-BAC-10591 ![]() |
4 ![]() |
0.5 ![]() |
and so on...![]() |
and so on...![]() |
and so on...![]() |
For each animal, I have ~777,000 SNPs and their corresponding allele frequencies.对于每只动物,我有大约 777,000 个 SNP 及其相应的等位基因频率。 (To be exact, I have 777,962 SNPs on 52 animals for a total of 40,454,024 observations).
(准确地说,我对 52 只动物有 777,962 个 SNP,总共 40,454,024 次观察)。
Basically I need to create an array of this data so that my rows are the SNPs, the column is the allele frequency, and the 3rd dimension of the array is the animalID.基本上我需要创建一个包含这些数据的数组,以便我的行是 SNP,列是等位基因频率,数组的第三维是动物 ID。 So in total, I need my dimensions to be [777962 1 52].
所以总的来说,我需要我的尺寸是[777962 1 52]。 However, for the life of me, I cannot figure out how to make this array.
但是,对于我的生活,我无法弄清楚如何制作这个数组。 I've tried the array command and the abind command, among a few other things out of desperation but I have not had any luck.
我已经尝试了 array 命令和 abind 命令,以及其他一些出于绝望的事情,但我没有任何运气。
This is the code that was originally suggested to me by a friend who knows more about R than I do:这是一个比我更了解R的朋友最初向我建议的代码:
array = abind(df, along = 3)
but that gives me an array with these dimensions: [40454024 2 1] which isn't right.但这给了我一个具有这些维度的数组: [40454024 2 1] 这是不对的。
Here are some other things I've tried that haven't worked:以下是我尝试过的其他一些无效的方法:
array = array(data = df$`SNPname`, df$AlleleFrequency, df$`AnimalID`)
array = abind(data = df$`SNPname`, df$AlleleFrequency, df$`AnimalID`)
array = array(c(df$`SNPname`, df$AlleleFrequency), dim =c(df$`SNPname`, df$AlleleFrequency, df$`AnimalID`))
If someone could help point me in the right direction, I would be eternally grateful.如果有人能帮助我指出正确的方向,我将永远感激不尽。 Thanks in advance!!
提前致谢!!
If you mean you need a 3d array with the three columns as dimensions, this means each cell/value will be a count.如果您的意思是您需要一个以三列为维度的 3d 数组,这意味着每个单元格/值都是一个计数。 For this, use
xtabs
(or table
):为此,请使用
xtabs
(或table
):
xtabs(~SNPname + AlleleFrequency + AnimalID, data = dat)
# , , AnimalID = 1
# AlleleFrequency
# SNPname 0 0.5 1
# ARS-BFGL-BAC-10172 1 0 0
# ARS-BFGL-BAC-1020 0 0 0
# ARS-BFGL-BAC-10345 0 0 0
# ARS-BFGL-BAC-10591 0 0 0
# , , AnimalID = 2
# AlleleFrequency
# SNPname 0 0.5 1
# ARS-BFGL-BAC-10172 0 0 0
# ARS-BFGL-BAC-1020 0 1 0
# ARS-BFGL-BAC-10345 0 0 0
# ARS-BFGL-BAC-10591 0 0 0
# , , AnimalID = 3
# AlleleFrequency
# SNPname 0 0.5 1
# ARS-BFGL-BAC-10172 0 0 0
# ARS-BFGL-BAC-1020 0 0 0
# ARS-BFGL-BAC-10345 0 0 1
# ARS-BFGL-BAC-10591 0 0 0
# , , AnimalID = 4
# AlleleFrequency
# SNPname 0 0.5 1
# ARS-BFGL-BAC-10172 0 0 0
# ARS-BFGL-BAC-1020 0 0 0
# ARS-BFGL-BAC-10345 0 0 0
# ARS-BFGL-BAC-10591 0 1 0
If you mean that you need the frequency to be the value in each cell and not the count, then while you can create a 3d array for it, it will never have more than 2d of data.如果您的意思是您需要频率是每个单元格中的值而不是计数,那么虽然您可以为其创建一个 3d 数组,但它永远不会有超过 2d 的数据。 One way to get this is with
tidyr::pivot_wider
:一种方法是使用
tidyr::pivot_wider
:
tidyr::pivot_wider(dat, "SNPname", names_from = "AnimalID", values_from = "AlleleFrequency")
# # A tibble: 4 x 5
# SNPname `1` `2` `3` `4`
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 ARS-BFGL-BAC-10172 0 NA NA NA
# 2 ARS-BFGL-BAC-1020 NA 0.5 NA NA
# 3 ARS-BFGL-BAC-10345 NA NA 1 NA
# 4 ARS-BFGL-BAC-10591 NA NA NA 0.5
Data数据
dat <- structure(list(SNPname = c("ARS-BFGL-BAC-10172", "ARS-BFGL-BAC-1020", "ARS-BFGL-BAC-10345", "ARS-BFGL-BAC-10591"), AnimalID = 1:4, AlleleFrequency = c(0, 0.5, 1, 0.5)), class = "data.frame", row.names = c(NA, -4L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.