[英]How to reshape data based on contents of dataframe R
我有一个R数据框,其中包含一列和很多行。 在此列中,有许多个人及其回应。 我想重塑此数据,每个人一行。 但是,没有ID变量,唯一的模式是每个人的最后分数是数字。 因此,您可以推断出数字后面应该是新行。
现有数据格式:
alpha
bravo
charlie
5
alpha
charlie
2
delta
1
dd <- data.frame(xx = c("alpha","bravo","charlie",5,"alpha","charlie",2,"delta",1))
我希望将这些数据按照最理想到最不理想的顺序重新排列为以下形式之一:
alpha bravo charlie 5 # Best
alpha charlie 2
delta 1
要么
alpha bravo charlie 5
alpha charlie 2
delta 1
要么
alpha bravo charlie 5 # Worst but acceptable if above is not possible.
alpha charlie 2
delta 1
以下是一些选项和格式:
txt <- readLines(n=9)
alpha
bravo
charlie
5
alpha
charlie
2
delta
1
idx <- grepl("^\\d+$", txt)
group <- cumsum(head(c(FALSE, idx), -1))
unname(split(txt, group))
# [[1]]
# [1] "alpha" "bravo" "charlie" "5"
#
# [[2]]
# [1] "alpha" "charlie" "2"
#
# [[3]]
# [1] "delta" "1"
lst <- split(txt[!idx], group[!idx])
cols <- unique(unlist(lst, F,F))
df <- cbind(
setNames(do.call(
rbind.data.frame,
lapply(lst, is.element, el=cols)),
cols),
val = as.integer(txt[idx])
)
# alpha bravo charlie delta val
# 0 TRUE TRUE TRUE FALSE 5
# 1 TRUE FALSE TRUE FALSE 2
# 2 FALSE FALSE FALSE TRUE 1
unname(cbind.data.frame(
do.call(rbind, lapply(lst, function(x) {
res <- setNames(x, x)[cols]
res <- ifelse(is.na(res), "", res)
})),
as.integer(txt[idx])
))
# 0 alpha bravo charlie 5
# 1 alpha charlie 2
# 2 delta 1
这是创建id和counter变量后的第二好的选择
library(reshape2)
row <- c('alpha', 'bravo', 'charlie', '5', 'alpha', 'charlie', '2', 'delta', '1')
df <- data.frame(row, id = 1, counter = 1:length(row))
for (i in 1:(nrow(df) - 1)) {
# if a number increment next id
if (length(grep('[0-9]+', df[i, 1])) > 0) {
df[(i + 1):nrow(df), 2] <- df[i + 1, 2] + 1
}
}
for (i in 2:nrow(df)) {
# if new start set to 1
if (df[i, 2] > df[i - 1, 2]) {
df[i, 3] <- 1
} else {
df[i, 3] <- df[i - 1, 3] + 1
}
}
reshape(df, idvar = 'id', timevar = 'counter', direction = 'wide')
理想的情况需要更多信息,以便人们知道如何分配单元。 次优可以通过以下方法实现。
x <- c('alpha',
'bravo',
'charlie',
'5',
'alpha',
'charlie',
'2',
'delta',
'1')
rowend <- grep("^[0-9]+$", x)
n <- length(rowend) # number of individuals
rowbegin <- c(1, head(rowend, n-1) + 1)
m <- max(rowend - rowbegin) + 1 # number of column
y <- Map(function(i, j) c(x[i:(j-1)], rep("", m - (j-i+1)), x[j]),
rowbegin, rowend)
as.data.frame(matrix(unlist(y), nrow = n, ncol = m, byrow = TRUE))
正如注释R所暗示的,理想布局上的裂缝可能不是此处工作的最佳工具,但...
x = c("alpha","bravo","charlie",5,"alpha","charlie",2,"delta",1)
num = grepl("[0-9]",x)
l = split(x,c(0,cumsum(num)[-length(num)]))
# last element always numeric
vals = unlist(lapply(l, function(x) as.numeric(x[length(x)])))
nams = lapply(l, function(x) x[-length(x)])
# unique names for structure
unique_nams = unique(unlist(nams))
full = matrix(unique_nams, nrow = length(nams), ncol = length(unique_nams), byrow = TRUE)
# reassign
sapply(seq_along(nams), function(i) full[i,!unique_nams %in% nams[[i]]] <<- NA)
answer = cbind(as.data.frame(full),vals)
## V1 V2 V3 V4 vals
## 0 alpha bravo charlie <NA> 5
## 1 alpha <NA> charlie <NA> 2
## 2 <NA> <NA> <NA> delta 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.