[英]Turning a data.frame into a single row
I have these data: 我有这些数据:
structure(list(type = c("journal", "all", "similar_age_1m", "similar_age_3m",
"similar_age_journal_1m", "similar_age_journal_3m"), count = c("13972",
"754555", "22408", "56213", "508", "1035"), rank = c("13759",
"754043", "22339", "56074", "459", "947"), pct = c("98.48", "99.93",
"99.69", "99.75", "90.35", "91.50")), .Names = c("type", "count",
"rank", "pct"), row.names = c(NA, -6L), class = "data.frame")
I'd like to turn it into a single row, with names of columns 2:4
prefixed by the corresponding type. 我想把它变成一行,列
2:4
名称以相应的类型为前缀。 eg journal.count
, journal.rank
... What is the fastest way to do this? 例如
journal.count
, journal.rank
......最快的方法是什么? For some reason dcast
and reshape
are not doing it for me and my solution is a little too cumbersome. 出于某种原因,
dcast
和reshape
并没有为我做这些,我的解决方案有点过于繁琐。
You mentioned reshape2
, so here is a way with that: 你提到了
reshape2
,所以这里有一个方法:
library("reshape2")
dcast(melt(dat, id.var="type"), 1~variable+type)
That gives: 这给了:
1 count_all count_journal count_similar_age_1m count_similar_age_3m
1 1 754555 13972 22408 56213
count_similar_age_journal_1m count_similar_age_journal_3m rank_all
1 508 1035 754043
rank_journal rank_similar_age_1m rank_similar_age_3m
1 13759 22339 56074
rank_similar_age_journal_1m rank_similar_age_journal_3m pct_all pct_journal
1 459 947 99.93 98.48
pct_similar_age_1m pct_similar_age_3m pct_similar_age_journal_1m
1 99.69 99.75 90.35
pct_similar_age_journal_3m
1 91.50
The type
and variable are separated with _
, instead of .
type
和变量用_
分隔,而不是.
, though. 但是。
Here's another way: 这是另一种方式:
y <- as.numeric(as.matrix(x[-1])) # flatten the data.frame
names(y) <- as.vector(outer(x[['type']], names(x)[-1], paste, sep='.'))
Assuming you are OK with adding a dummy "time" variable for the reshaping, you can do this easily with base R also. 假设您可以为重塑添加虚拟“时间”变量,您也可以使用基数R轻松完成此操作。 Assuming your
data.frame
is called: 假设您的
data.frame
被调用:
mydf$id <- 1
(mydfw <- reshape(mydf, direction = "wide", idvar="id", timevar="type"))
# id count.journal rank.journal pct.journal count.all rank.all pct.all
# 1 1 13972 13759 98.48 754555 754043 99.93
# count.similar_age_1m rank.similar_age_1m pct.similar_age_1m
# 1 22408 22339 99.69
# count.similar_age_3m rank.similar_age_3m pct.similar_age_3m
# 1 56213 56074 99.75
# count.similar_age_journal_1m rank.similar_age_journal_1m
# 1 508 459
# pct.similar_age_journal_1m count.similar_age_journal_3m
# 1 90.35 1035
# rank.similar_age_journal_3m pct.similar_age_journal_3m
# 1 947 91.50
Cleanup is not too bad either, if you want to reorder your columns. 如果要重新排序列,清理也不算太糟糕。
mydfw <- mydfw[, unlist(sapply(names(mydf), grep, names(mydfw)))]
Here's a solution using expand.grid
to get the names. 这是使用
expand.grid
获取名称的解决方案。
To get the data, first, subset to remove the first column which contains names. 要获取数据,首先要删除包含名称的第一列的子集。 Then, transpose and convert to numeric.
然后,转置并转换为数字。
> eg <- expand.grid(colnames(x[, -1]), x[, 1])
> setNames(as.numeric(t(x[, -1])), paste(eg[[2]], eg[[1]], sep="."))
journal.count journal.rank
13972.00 13759.00
journal.pct all.count
98.48 754555.00
all.rank all.pct
754043.00 99.93
similar_age_1m.count similar_age_1m.rank
22408.00 22339.00
similar_age_1m.pct similar_age_3m.count
99.69 56213.00
similar_age_3m.rank similar_age_3m.pct
56074.00 99.75
similar_age_journal_1m.count similar_age_journal_1m.rank
508.00 459.00
similar_age_journal_1m.pct similar_age_journal_3m.count
90.35 1035.00
similar_age_journal_3m.rank similar_age_journal_3m.pct
947.00 91.50
#assuming your data is called "test"
result <- as.data.frame(matrix(t(test[-1]),nrow=1),stringsAsFactors=FALSE)
names(result) <- as.vector(t(outer(unique(test$type),names(test[-1]),paste,sep=".")))
str(result)
'data.frame': 1 obs. of 18 variables:
$ journal.count : chr "13972"
$ journal.rank : chr "13759"
$ journal.pct : chr "98.48"
$ all.count : chr "754555"
$ all.rank : chr "754043"
$ all.pct : chr "99.93"
$ similar_age_1m.count : chr "22408"
$ similar_age_1m.rank : chr "22339"
$ similar_age_1m.pct : chr "99.69"
$ similar_age_3m.count : chr "56213"
$ similar_age_3m.rank : chr "56074"
$ similar_age_3m.pct : chr "99.75"
$ similar_age_journal_1m.count: chr "508"
$ similar_age_journal_1m.rank : chr "459"
$ similar_age_journal_1m.pct : chr "90.35"
$ similar_age_journal_3m.count: chr "1035"
$ similar_age_journal_3m.rank : chr "947"
$ similar_age_journal_3m.pct : chr "91.50"
Assuming your data frame is called dat here's a solution. 假设您的数据框称为dat,这是一个解决方案。 This is a bit crude and may not be what you're after:
这有点粗糙,可能不是你想要的:
dat2 <- data.frame(matrix(unlist(lapply(1:nrow(dat), function(i) dat[i, -1])), nrow=1))
colnames(dat2) <- paste0(rep(dat[, 1], each=ncol(dat)-1), ".", 1:(ncol(dat)-1))
dat2
If it doesn't have to be a data frame this could work too: 如果它不必是数据框,这也可以工作:
dat3 <- as.numeric(unlist(lapply(1:nrow(dat), function(i) dat[i, -1])))
names(dat3) <- paste0(rep(dat[, 1], each=ncol(dat)-1), ".", 1:(ncol(dat)-1))
dat3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.