扩展R数据的不同方法

Question

I have the following data, and I would like to expand it. 我有以下数据，我想扩展它。 For example, if June has two successes, and one failure, my dataset should look like: 例如，如果June获得两次成功，一次失败，那么我的数据集应如下所示：

month | is_success
------------------
   6  |     T
   6  |     T
   6  |     F

Dataset is as follows: 数据集如下：

# Months from July to December
months <- 7:12

# Number of success (failures) for each month
successes <- c(11,22,12,7,6,13)
failures <- c(20,19,11,16,13,10)

A sample solution is as follows: 示例解决方案如下：

dataset<-data.frame()

for (i in 1:length(months)) {
  dataset <- rbind(dataset,cbind(rep(months[i], successes[i]), rep(T, successes[i])))
  dataset <- rbind(dataset,cbind(rep(months[i], failures[i]), rep(F, failures[i])))
}

names(dataset) <- c("months", "is_success")
dataset[,"is_success"] <- as.factor(dataset[,"is_success"])

Question: What are the different ways to rewrite this code? 问题：重写此代码有哪些不同的方法？

I am looking for a comprehensive solution with different but efficient ways (matrix, loop, apply). 我正在寻找一种使用不同但有效的方法（矩阵，循环，应用）的全面解决方案。

Thank you! 谢谢！

Answer 1

Here is one way with rep . 这是rep一种方法。 Create a dataset with 'months' and 'is_success' based on replication of 1 and 0. Then replicate the rows by the values of 'successes', 'failures', order if necessary and set the row names to 'NULL' 基于1和0的复制，创建带有“ months”和“ is_success”的数据集。然后按“ successes”，“ failures”的值复制行，并在必要时进行order ，并将行名称设置为“ NULL”

d1 <- data.frame(months, is_success = factor(rep(c(1, 0), each = length(months))))
d2 <- d1[rep(1:nrow(d1), c(successes, failures)),]
d2 <- d2[order(d2$months),] 
row.names(d2) <- NULL

Now, we check whether this is equal to the data created from for loop 现在，我们检查这是否等于从for循环创建的数据

all.equal(d2, dataset, check.attributes = FALSE)
#[1] TRUE

Or as @thelatemail suggested, 'd1' can be created with expand.grid 或者按照@thelatemail的建议，可以使用expand.grid创建“ d1”

d1 <- expand.grid(month=months, is_success=1:0)

Answer 2

using mapply you can try this: 使用mapply可以尝试以下操作：

createdf<-function(month,successes,failures){
    data.frame(month=rep(x = month,(successes+failures)), 
               is_success=c(rep(x = T,successes),
                            rep(x = F,failures))
               )
}

Now create a list of required data.frames : 现在创建所需的data.frames列表：

lofdf<-mapply(FUN = createdf,months,successes,failures,SIMPLIFY = F)

And then combine using the plyr package's ldply function: 然后结合使用plyr包的ldply函数：

resdf<-ldply(lofdf,.fun = data.frame)

扩展R数据的不同方法

问题描述

2 个解决方案

解决方案1
1 2017-07-20 04:37:58

解决方案2
1 2017-07-20 04:52:37

扩展R数据的不同方法

问题描述

2 个解决方案

解决方案1 1 2017-07-20 04:37:58

解决方案2 1 2017-07-20 04:52:37

解决方案1
1 2017-07-20 04:37:58

解决方案2
1 2017-07-20 04:52:37