如何使 R 中的嵌套 for 循环写入 output 到 dataframe 更高效？

Question

我是 R 和 stackoverflow 菜鸟 - 如果问题不合适或结构不合理，请原谅。

我正在尝试编写一些 R 代码，将 nrow x ncol 表/数据帧转换为 dataframe，每行包括：RowNumber、Column Number、来自原始表/数据帧的第j行、第i行的值。

我有许多我想做类似的表/数据框，每个表/数据框都有不同数量的行，列......

因此，在此示例中，我有一个 6 行 9 列的 dataframe，我想将其转换为 dataframe，其中包含 54 行：

#create example data
values <- rnorm(54, mean = 75, sd=3)
table_m <- matrix(values, ncol=9)
table <- as.data.frame(table_m)

我到目前为止的代码如下：

##count rows and columns
nrows <- nrow(table)
ncols <- ncol(table)

#set up empty matrix for output
iterations <- nrows * ncols 
variables <-   3
output <- matrix(ncol=variables, nrow=iterations)

#set up first empty vector
my_vector_1 = c()

#run first nested for loop to create sequence of nrow * copies of column numbers
for (j in 1:ncol(table)) 
  for (i in 1:nrow(table))
  {
    my_vector_1[length(my_vector_1)+1] = colnames(table)[j]
  }

# add to first column of output
output[,1] <- my_vector_1

# set up second empty vector
my_vector_2 = c()

#run second nested for loop to create sequence of ncol * copies of row numbers
for (j in 1:ncol(table)) 
  for (i in 1:nrow(table))
  {
    my_vector_2[length(my_vector_2)+1] = rownames(table)[i]
}

# add to second column of output
output[,2] <- my_vector_2

#create third empty vector
my_vector_3 = c()

#run third nested for loop to pull values from original table/dataframe
for (j in 1:ncol(table)) 
  for (i in 1:nrow(table))
  {
    my_vector_3[length(my_vector_3)+1] = table[i,j]
  }

output[,3] <- my_vector_3

所以，这段代码可以工作，并且可以满足我的需要……但在我的 noob state 中，它是通过大量谷歌搜索拼凑而成的，看起来非常不雅。 特别是，创建中间向量，然后将它们分配给 output dataframe 列似乎有点麻烦 - 但我无法尝试将值直接放入我的 output dataframe 的列中。

非常欢迎任何关于如何改进代码的想法。

提前谢谢了...

Answer 1

这是一种很好的实现方式，但当然可以用更短的方式实现。 尝试：

table$id <- 1:nrow(table) # Create a row no. column
tidyr::pivot_longer(table, cols = -id)
# A tibble: 54 x 3
      id name  value
   <int> <chr> <dbl>
 1     1 V1     70.3
 2     1 V2     72.8
 3     1 V3     76.1
 4     1 V4     73.1
 5     1 V5     71.9
 6     1 V6     73.8
 7     1 V7     76.4
 8     1 V8     74.1
 9     1 V9     75.5
10     2 V1     73.8
# ... with 44 more rows

我们在这里做什么？

首先，我们将“rownames”作为列添加到数据中（因为出于某种原因，您想将它们保留在结果数据框中。然后，我们使用 tidyr package 中的tidyr pivot_longer() function。您想要什么对数据做的是重塑。在 R 中有很多这样做的可能性，（ reshape() ， reshape2库，或者来自tidyr的函数pivot_longer() ， pivot_wider() 。

我们希望我们的“宽”数据以“长”形式存在（您可能想看看这个 Cheat Sheet ，即使函数gather()和spread()被pivot_longer()和pivot_wider()取代，但是他们基本上以相同的方式 function。

使用 function 参数cols = -id ，我们指定除id之外的所有变量都应该出现在新数据框的值列中。

如果你想得到一个矩阵作为结果，只需在新创建的 object 上运行as.matrix()即可。

Answer 2

基地R解决方案：

data.frame(c(t(df)))

如果我们想知道该值属于原始 data.frame 中的哪个 V 向量：

data.frame(var = paste0("V", seq_along(df)), val = c(t(df)))

还包括行索引：

transform(data.frame(var = paste0("V", seq_along(df)), val = c(t(df)), stringsAsFactors = F),
          idx = ave(var, var, FUN = seq.int))

更强大的解决方案（给定@r2evans 推理）：

transform(data.frame(var = names(df), val = do.call("c", df), 
  stringsAsFactors = FALSE, row.names = NULL), idx = ave(var, var, FUN = seq.int))

使用stack()的另一个更强大的解决方案：

transform(data.frame(stack(df), stringsAsFactors = FALSE, row.names = NULL),
          idx = ave(as.character(ind), ind, FUN = seq.int))

29/12/2020 编辑：镜像@Ben 的强大解决方案，但在 Base R 中：

transform(data.frame(name = as.character(rep(names(df), nrow(df))), value = c(t(df)),
  stringsAsFactors = FALSE), id = ave(name, name, FUN = seq.int))

最直接的 Base R 解决方案（反映 Ben 的回答）：

# Flatten the data.frame: 
stacked_df <- setNames(within(stack(df), {
  # Coerce index to character type (to enable counting):
  ind <- as.character(ind)
  # Issue a count to each ind element: 
  id <- ave(ind, ind, FUN = seq.int)
  }
  # Rename the data.frame's vector match Ben's accepted solution:
), c("value", "name", "id"))

# Order the data.frame as in Ben's answer: 
ordered_df <- with(stacked_df, stacked_df[order(id), c("id", "name", "value")])

数据：

values <- rnorm(54, mean = 75, sd=3)
table_m <- matrix(values, ncol=9)
df <- as.data.frame(table_m)

Answer 3

基于上面@hello_friend 的建议答案，我能够在 base R 中提出这个解决方案：

##Set up example data
values <- rnorm(54, mean = 75, sd=3)
table_m <- matrix(values, ncol=9)
df <- as.data.frame(table_m)

##Create intermediate vectors
total_length <- nrow(df)*ncol(df)
statment_count <- rep(seq_along(1:nrow(df)),each =ncol(df), length.out=total_length)
site_count <- rep(seq_along(1:ncol(df)),length.out=total_length)
value = c(t(df))

##join vectors into data frame
output <- data.frame(site = site_count, 
                     statement = statment_count,
                     value = value  
                     )

##sort output                    
output <- output[with(output, order(site, -statement)), ]

这肯定比我最初使用的一系列 for 循环更简单、更直观。 希望这会帮助正在寻找类似问题的基本 R 解决方案的其他人。

此外，为了完整起见，为@Ben 和@Ronak Shah 提出的 tidyverse 解决方案添加了“完整”解决方案

##Set up example data
values <- rnorm(54, mean = 75, sd=3)
table_m <- matrix(values, ncol=9)
table <- as.data.frame(table_m)

output_2 <- table %>% 
            mutate(statement = row_number()) %>%
            pivot_longer(cols = -statement)%>%
            rename(site = name)%>%
            relocate(site) %>%
            mutate(site = as.numeric(gsub("V", "", site))) %>%
            arrange(site, desc(statement))

如何使 R 中的嵌套 for 循环写入 output 到 dataframe 更高效？

问题描述

3 个解决方案

解决方案1
0 已采纳 2020-12-22 12:18:08

解决方案2
0 2020-12-22 12:32:20

解决方案3
0 2020-12-29 07:02:57

如何使 R 中的嵌套 for 循环写入 output 到 dataframe 更高效？

问题描述

3 个解决方案

解决方案1 0 已采纳 2020-12-22 12:18:08

解决方案2 0 2020-12-22 12:32:20

解决方案3 0 2020-12-29 07:02:57

解决方案1
0 已采纳 2020-12-22 12:18:08

解决方案2
0 2020-12-22 12:32:20

解决方案3
0 2020-12-29 07:02:57