[英]How can I move each value in a data frame column into its own column?
I am using R to construct and analyze a data set created from a Python script that a colleague has created which returns the following structure where 13 refers to the number of samples and 3128 is the number of observations of traits that are coded as a single digit(every single digit after the sample name represents a single column, the value encapsulating the coding for the trait): 我正在使用R来构建和分析由同事创建的Python脚本创建的数据集,该数据集返回以下结构,其中13表示样本数,3128是对特征进行观测的数量,这些数字被编码为个位数(样品名称后的每个数字代表一列,该值封装了特征的编码):
13 3128
>1062_0 0000000000[...]
>1066A_0 000001010[...]
>1067A_0 000002010[...]
>1067B_0 110013010[...]
>1067C_0 000024010[...]
>1067D_0 000024010[...]
>1084A_0 200100010[...]
>1084B_0 001005110[...]
>1084C_0 000000010[...]
>1086_0 0100002100[...]
>1087_0 3002040100[...]
>1088_0 0000060111[...]
>C105_0 0000050120[...]
I am working to get these get these data into a data frame which has 13 rows and 3,128 columns. 我正在努力将这些数据获取到具有13行和3128列的数据框中。
I have used the read.phylip function of phylotools to read in this file above and can get it into a data.frame: 我已经使用了phylotools的read.phylip函数来读取上面的这个文件,并将其放入data.frame中:
SL_FFR_input <- read.phylip(fil = "matrix.phy")
SL_FFR_frame <- phy2dat(SL_FFR_input)
However, this results in a data frame of two columns, V1 being the sample names, and V2 being a string of all of the single digit codings. 但是,这导致两列的数据帧,V1是样本名称,V2是所有单位数字编码的字符串。
The frame that would be useful is shown below, where the sample names form the row names and each value now has its own column. 下面将显示有用的框架,其中样本名称构成行名称,并且每个值现在都有自己的列。
>1062_0 0 0 0 0 0 0 0 0 0[...]
>1066A_0 0 0 0 0 0 1 0 1 0[...]
>1067A_0 0 0 0 0 0 2 0 1 0[...]
>1067B_0 1 1 0 0 1 3 0 1 0[...]
>1067C_0 0 0 0 0 2 4 0 1 0[...]
>1067D_0 0 0 0 0 2 4 0 1 0[...]
>1084A_0 2 0 0 1 0 0 0 1 0[...]
>1084B_0 0 0 1 0 0 5 1 1 0[...]
>1084C_0 0 0 0 0 0 0 0 1 0[...]
>1086_0 0 1 0 0 0 0 2 1 0[...]
>1087_0 3 0 0 2 0 4 0 1 0[...]
>1088_0 0 0 0 0 0 6 0 1 1[...]
>C105_0 0 0 0 0 0 5 0 1 2[...]
It would be a huge help if someone could point me in the right direction! 如果有人可以指出正确的方向,那将是巨大的帮助!
I recommend dplyr + tidyr, it's possible to do this with strsplit and rbind, but it's ugly. 我建议使用dplyr + tidyr,可以使用strsplit和rbind进行此操作,但这很丑陋。
library(dplyr)
library(tidyr)
df1 <- data.frame(snames = c('a','b','c'),
digits = c('0000000000000',
'0000100000000',
'0000000001000'))
result <- df1 %>% separate(digits, paste0('X',1:13),sep = 1:12)
that will separate at the character positions 1:12 in the column, and name the columns X1 -> X13 它将在列中字符位置1:12处分开,并命名列X1-> X13
EDIT: for your case change the 13 to 3128, and the 12 to 3127, "digits" to whatever the name of your column is 编辑:对于您的情况,将13更改为3128,将12更改为3127,将“数字”更改为您列的名称
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.