简体   繁体   English

如何根据数据框R的内容重塑数据

[英]How to reshape data based on contents of dataframe R

I have an R data frame consisting of a single column, and lots of rows. 我有一个R数据框,其中包含一列和很多行。 Within this column are a number of individuals and their responses. 在此列中,有许多个人及其回应。 I would like to reshape this data, with one row for each individual. 我想重塑此数据,每个人一行。 However there is no ID variable, and the only pattern is that the last score for each individual is numeric. 但是,没有ID变量,唯一的模式是每个人的最后分数是数字。 Hence you can deduce that what follows a number should be a new row. 因此,您可以推断出数字后面应该是新行。

Existing data format: 现有数据格式:

alpha
bravo
charlie
5
alpha
charlie
2
delta
1

dd <- data.frame(xx = c("alpha","bravo","charlie",5,"alpha","charlie",2,"delta",1))

I would like this data to be rearranged into one of the following forms, in order of most desirable to least desirable: 我希望将这些数据按照最理想到最不理想的顺序重新排列为以下形式之一:

alpha   bravo   charlie          5    # Best
alpha           charlie          2
                          delta  1

or 要么

alpha   bravo   charlie  5
alpha   charlie          2
delta                    1

or 要么

alpha   bravo   charlie 5    # Worst but acceptable if above is not possible.
alpha   charlie 2
delta   1 

Here are some options and formats: 以下是一些选项和格式:

txt <- readLines(n=9)
alpha
bravo
charlie
5
alpha
charlie
2
delta
1
idx <- grepl("^\\d+$", txt)
group <- cumsum(head(c(FALSE, idx), -1))
unname(split(txt, group))
# [[1]]
# [1] "alpha"   "bravo"   "charlie" "5"      
# 
# [[2]]
# [1] "alpha"   "charlie" "2"      
# 
# [[3]]
# [1] "delta" "1"

lst <- split(txt[!idx], group[!idx])
cols <- unique(unlist(lst, F,F)) 
df <- cbind(
  setNames(do.call(
      rbind.data.frame, 
      lapply(lst, is.element, el=cols)), 
    cols),
  val = as.integer(txt[idx])
)
#   alpha bravo charlie delta val
# 0  TRUE  TRUE    TRUE FALSE   5
# 1  TRUE FALSE    TRUE FALSE   2
# 2 FALSE FALSE   FALSE  TRUE   1


unname(cbind.data.frame(
  do.call(rbind, lapply(lst, function(x) {
    res <- setNames(x, x)[cols]
    res <- ifelse(is.na(res), "", res)
  })), 
  as.integer(txt[idx])
))
# 0 alpha bravo charlie       5
# 1 alpha       charlie       2
# 2                     delta 1

This gives you the second best option after creating an id and counter variable 这是创建id和counter变量后的第二好的选择

library(reshape2)

row <- c('alpha', 'bravo', 'charlie', '5', 'alpha', 'charlie', '2', 'delta', '1')
df <- data.frame(row, id = 1, counter = 1:length(row))
for (i in 1:(nrow(df) - 1)) {
  # if a number increment next id 
  if (length(grep('[0-9]+', df[i, 1])) > 0) {
    df[(i + 1):nrow(df), 2] <- df[i + 1, 2] + 1
  }
}

for (i in 2:nrow(df)) {
  # if new start set to 1
  if (df[i, 2] > df[i - 1, 2]) {
    df[i, 3] <- 1
  } else {
    df[i, 3] <- df[i - 1, 3] + 1
  }
}

reshape(df, idvar = 'id', timevar = 'counter', direction = 'wide')

The ideal one requires more information so that people know how the cells should be allocated. 理想的情况需要更多信息,以便人们知道如何分配单元。 The second best can be achieved by the following. 次优可以通过以下方法实现。

x <- c('alpha',
       'bravo',
       'charlie',
       '5',
       'alpha',
       'charlie',
       '2',
       'delta',
       '1')

rowend <- grep("^[0-9]+$", x)
n <- length(rowend) # number of individuals
rowbegin <- c(1, head(rowend, n-1) + 1)
m <- max(rowend - rowbegin) + 1  # number of column

y <- Map(function(i, j) c(x[i:(j-1)], rep("", m - (j-i+1)), x[j]), 
         rowbegin, rowend)
as.data.frame(matrix(unlist(y), nrow = n, ncol = m, byrow = TRUE))

A crack at the ideal layout, as intimiated in comments R probably not the best tool for the job here but ... 正如注释R所暗示的,理想布局上的裂缝可能不是此处工作的最佳工具,但...

x = c("alpha","bravo","charlie",5,"alpha","charlie",2,"delta",1)

num = grepl("[0-9]",x)

l = split(x,c(0,cumsum(num)[-length(num)]))
# last element always numeric
vals = unlist(lapply(l, function(x) as.numeric(x[length(x)])))
nams = lapply(l, function(x) x[-length(x)])
# unique names for structure
unique_nams = unique(unlist(nams))

full = matrix(unique_nams, nrow = length(nams), ncol = length(unique_nams), byrow = TRUE)

# reassign
sapply(seq_along(nams), function(i) full[i,!unique_nams %in% nams[[i]]] <<- NA)

answer = cbind(as.data.frame(full),vals)

##    V1    V2      V3    V4 vals
## 0 alpha bravo charlie  <NA>    5
## 1 alpha  <NA> charlie  <NA>    2
## 2  <NA>  <NA>    <NA> delta    1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM