重新格式化R的data.frame

Question

I have a data.frame of this format: 我有这种格式的data.frame ：

set.seed(1)
pl.mat <-matrix(rnorm(500*1000),nrow=500,ncol=1000)
colnames(pl.mat) <- gsub("\\s+","",apply(expand.grid(paste("pl",1:10,sep=""),1:100),1,function(x) paste(unlist(x),collapse=".")),perl=T)
df <- cbind(data.frame(id=1:500,group.id=rep(1:25,20)),pl.mat)

> df[1:5,1:5]
  id group.id      pl1.1       pl2.1       pl3.1
1  1        1 -0.6264538  0.07730312  1.13496509
2  2        2  0.1836433 -0.29686864  1.11193185
3  3        3 -0.8356286 -1.18324224 -0.87077763
4  4        4  1.5952808  0.01129269  0.21073159
5  5        5  0.3295078  0.99160104  0.06939565

df$id are grouped by df$group.id . df$id按df$group.id分组。 Then each column has an experimental plate id ( pl1 - pl10 ), and the integer following the period separator is a well id (1-100). 然后，每列都有一个实验板编号（ pl1 pl10 ），周期分隔符后面的整数是井编号（1-100）。 Hence each plate has 100 columns. 因此，每个板有100列。

I want to build a new data.frame which these columns: df$id , df$group.id , well id, and the all plates. 我想建立一个新的data.frame ，这些列是： df$id ， df$group.id ，well id和所有盘子。

Meaning this format: 表示这种格式：

id group.id      well.id      pl1       pl2       pl3
1  1             1     -0.6264538 0.07730312  1.13496509
1  1             2            ...       ...       ...
.
.
.
1  2             1            ...       ...       ...
.
.
.
500 25 .        100           ...       ...       ...

Any good concise code for that? 有什么好的简洁代码吗？

Answer 1

df %>% 
  gather(var, val, -id, -group.id) %>%
  separate(var, c("pl.id", "well.id")) %>% 
  spread(pl.id, val)

Answer 2

Dan, you could create a new data.frame with the desired columns. Dan，您可以使用所需的列创建一个新的data.frame 。 Let's say you want column df$id and df$group.id : 假设您要使用df$id和df$group.id ：

newDF <- as.data.frame(cbind(df$id, df$group.id))

Now, if you had such a huge amount of columns where you cannot write-out any, you could use the index as well: 现在，如果您有大量的列无法写出任何列，则也可以使用索引：

newDF <- as.data.frame(cbind(df[,2], df[,5]))

Therefore, also ranges work: 因此，范围也适用：

newDF <- as.data.frame(cbind(df[,2:210], df[,507:1020]))

Does this work for you? 这对您有用吗？ Another solution would be to use a loop and construct the indices or column names dynamically. 另一个解决方案是使用循环并动态构造索引或列名。 Here a draft: 这是草稿：

for(i in 1:10) {
  print(eval(parse(text=paste("df$id", i, sep = ""))))
}

Here, the column names df$id1 up to df$id10 gets build dynamically. 在这里，列名df$id1至df$id10会动态生成。

Best regards, Thorsten 最好的问候，Thorsten

重新格式化R的data.frame

问题描述

2 个解决方案

解决方案1
1 已采纳 2017-05-25 07:23:49

解决方案2
1 2017-05-25 07:40:31

重新格式化R的data.frame

问题描述

2 个解决方案

解决方案1 1 已采纳 2017-05-25 07:23:49

解决方案2 1 2017-05-25 07:40:31

解决方案1
1 已采纳 2017-05-25 07:23:49

解决方案2
1 2017-05-25 07:40:31