[英]Reformatting an R's data.frame
I have a data.frame
of this format: 我有这种格式的
data.frame
:
set.seed(1)
pl.mat <-matrix(rnorm(500*1000),nrow=500,ncol=1000)
colnames(pl.mat) <- gsub("\\s+","",apply(expand.grid(paste("pl",1:10,sep=""),1:100),1,function(x) paste(unlist(x),collapse=".")),perl=T)
df <- cbind(data.frame(id=1:500,group.id=rep(1:25,20)),pl.mat)
> df[1:5,1:5]
id group.id pl1.1 pl2.1 pl3.1
1 1 1 -0.6264538 0.07730312 1.13496509
2 2 2 0.1836433 -0.29686864 1.11193185
3 3 3 -0.8356286 -1.18324224 -0.87077763
4 4 4 1.5952808 0.01129269 0.21073159
5 5 5 0.3295078 0.99160104 0.06939565
df$id
are grouped by df$group.id
. df$id
按df$group.id
分组。 Then each column has an experimental plate id ( pl1
- pl10
), and the integer following the period separator is a well id (1-100). 然后,每列都有一个实验板编号(
pl1
pl10
),周期分隔符后面的整数是井编号(1-100)。 Hence each plate has 100 columns. 因此,每个板有100列。
I want to build a new data.frame
which these columns: df$id
, df$group.id
, well id, and the all plates. 我想建立一个新的
data.frame
,这些列是: df$id
, df$group.id
,well id和所有盘子。
Meaning this format: 表示这种格式:
id group.id well.id pl1 pl2 pl3
1 1 1 -0.6264538 0.07730312 1.13496509
1 1 2 ... ... ...
.
.
.
1 2 1 ... ... ...
.
.
.
500 25 . 100 ... ... ...
Any good concise code for that? 有什么好的简洁代码吗?
df %>%
gather(var, val, -id, -group.id) %>%
separate(var, c("pl.id", "well.id")) %>%
spread(pl.id, val)
Dan, you could create a new data.frame
with the desired columns. Dan,您可以使用所需的列创建一个新的
data.frame
。 Let's say you want column df$id
and df$group.id
: 假设您要使用
df$id
和df$group.id
:
newDF <- as.data.frame(cbind(df$id, df$group.id))
Now, if you had such a huge amount of columns where you cannot write-out any, you could use the index as well: 现在,如果您有大量的列无法写出任何列,则也可以使用索引:
newDF <- as.data.frame(cbind(df[,2], df[,5]))
Therefore, also ranges work: 因此,范围也适用:
newDF <- as.data.frame(cbind(df[,2:210], df[,507:1020]))
Does this work for you? 这对您有用吗? Another solution would be to use a loop and construct the indices or column names dynamically.
另一个解决方案是使用循环并动态构造索引或列名。 Here a draft:
这是草稿:
for(i in 1:10) {
print(eval(parse(text=paste("df$id", i, sep = ""))))
}
Here, the column names df$id1
up to df$id10
gets build dynamically. 在这里,列名
df$id1
至df$id10
会动态生成。
Best regards, Thorsten 最好的问候,Thorsten
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.