繁体   English   中英

如何根据数据框R的内容重塑数据

[英]How to reshape data based on contents of dataframe R

我有一个R数据框,其中包含一列和很多行。 在此列中,有许多个人及其回应。 我想重塑此数据,每个人一行。 但是,没有ID变量,唯一的模式是每个人的最后分数是数字。 因此,您可以推断出数字后面应该是新行。

现有数据格式:

alpha
bravo
charlie
5
alpha
charlie
2
delta
1

dd <- data.frame(xx = c("alpha","bravo","charlie",5,"alpha","charlie",2,"delta",1))

我希望将这些数据按照最理想到最不理想的顺序重新排列为以下形式之一:

alpha   bravo   charlie          5    # Best
alpha           charlie          2
                          delta  1

要么

alpha   bravo   charlie  5
alpha   charlie          2
delta                    1

要么

alpha   bravo   charlie 5    # Worst but acceptable if above is not possible.
alpha   charlie 2
delta   1 

以下是一些选项和格式:

txt <- readLines(n=9)
alpha
bravo
charlie
5
alpha
charlie
2
delta
1
idx <- grepl("^\\d+$", txt)
group <- cumsum(head(c(FALSE, idx), -1))
unname(split(txt, group))
# [[1]]
# [1] "alpha"   "bravo"   "charlie" "5"      
# 
# [[2]]
# [1] "alpha"   "charlie" "2"      
# 
# [[3]]
# [1] "delta" "1"

lst <- split(txt[!idx], group[!idx])
cols <- unique(unlist(lst, F,F)) 
df <- cbind(
  setNames(do.call(
      rbind.data.frame, 
      lapply(lst, is.element, el=cols)), 
    cols),
  val = as.integer(txt[idx])
)
#   alpha bravo charlie delta val
# 0  TRUE  TRUE    TRUE FALSE   5
# 1  TRUE FALSE    TRUE FALSE   2
# 2 FALSE FALSE   FALSE  TRUE   1


unname(cbind.data.frame(
  do.call(rbind, lapply(lst, function(x) {
    res <- setNames(x, x)[cols]
    res <- ifelse(is.na(res), "", res)
  })), 
  as.integer(txt[idx])
))
# 0 alpha bravo charlie       5
# 1 alpha       charlie       2
# 2                     delta 1

这是创建id和counter变量后的第二好的选择

library(reshape2)

row <- c('alpha', 'bravo', 'charlie', '5', 'alpha', 'charlie', '2', 'delta', '1')
df <- data.frame(row, id = 1, counter = 1:length(row))
for (i in 1:(nrow(df) - 1)) {
  # if a number increment next id 
  if (length(grep('[0-9]+', df[i, 1])) > 0) {
    df[(i + 1):nrow(df), 2] <- df[i + 1, 2] + 1
  }
}

for (i in 2:nrow(df)) {
  # if new start set to 1
  if (df[i, 2] > df[i - 1, 2]) {
    df[i, 3] <- 1
  } else {
    df[i, 3] <- df[i - 1, 3] + 1
  }
}

reshape(df, idvar = 'id', timevar = 'counter', direction = 'wide')

理想的情况需要更多信息,以便人们知道如何分配单元。 次优可以通过以下方法实现。

x <- c('alpha',
       'bravo',
       'charlie',
       '5',
       'alpha',
       'charlie',
       '2',
       'delta',
       '1')

rowend <- grep("^[0-9]+$", x)
n <- length(rowend) # number of individuals
rowbegin <- c(1, head(rowend, n-1) + 1)
m <- max(rowend - rowbegin) + 1  # number of column

y <- Map(function(i, j) c(x[i:(j-1)], rep("", m - (j-i+1)), x[j]), 
         rowbegin, rowend)
as.data.frame(matrix(unlist(y), nrow = n, ncol = m, byrow = TRUE))

正如注释R所暗示的,理想布局上的裂缝可能不是此处工作的最佳工具,但...

x = c("alpha","bravo","charlie",5,"alpha","charlie",2,"delta",1)

num = grepl("[0-9]",x)

l = split(x,c(0,cumsum(num)[-length(num)]))
# last element always numeric
vals = unlist(lapply(l, function(x) as.numeric(x[length(x)])))
nams = lapply(l, function(x) x[-length(x)])
# unique names for structure
unique_nams = unique(unlist(nams))

full = matrix(unique_nams, nrow = length(nams), ncol = length(unique_nams), byrow = TRUE)

# reassign
sapply(seq_along(nams), function(i) full[i,!unique_nams %in% nams[[i]]] <<- NA)

answer = cbind(as.data.frame(full),vals)

##    V1    V2      V3    V4 vals
## 0 alpha bravo charlie  <NA>    5
## 1 alpha  <NA> charlie  <NA>    2
## 2  <NA>  <NA>    <NA> delta    1

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM