简体   繁体   English

如何在R中的reshape包中循环dcast函数

[英]How to loop dcast function in reshape package in R

Being a relatively new R user, I have trouble with any looping functions. 作为一个相对较新的R用户,我遇到任何循环函数的问题。 I have looked at many tutorials but the examples in them are usually very basic and therefore easy to execute. 我查看了许多教程,但其中的示例通常非常基础,因此易于执行。 However I need create slightly more complex loops and am having a lot of trouble figuring out how to do so. 但是我需要创建稍微复杂的循环,并且在确定如何执行此操作时遇到很多麻烦。 There are a few related looping questions on here and other forums but none match exactly what I need and though I have tried to adapt other answers for my current problem, I keep running into errors. 在这里和其他论坛上有一些相关的循环问题,但没有一个完全符合我的需要,虽然我已经尝试为我当前的问题调整其他答案,但我一直遇到错误。

I have 2000 .csv files with data tabulated in long-format data (simplified example): 我有2000个.csv文件,其数据以长格式数据制表(简化示例):

solution1    
> sol1     sol2     Istat
> s1       s2       0.435
> s1       s3       0.456
> s1       s4       0.845
> s1       s5       0.234

It is basically a summary of pairwise comparisons of 2000 individual solutions that I have, with the similarity between solutions summarised in an 'Istat' value. 它基本上是对我所拥有的2000个单独解决方案的成对比较的总结,以及在'Istat'值中汇总的解决方案之间的相似性。

I am trying to dcast each of these 2000 .csv files into wide-format table (using the reshape package in R) so they look like (following example above): 我试图将这些2000 .csv文件中的每一个dcast转换为宽格式表(使用R中的reshape包),所以它们看起来像(上面的示例):

     s1     s2     s3     s4     s5
s1   NA     0.435  0.456  0.845  0.234

I know how to do this just once with a single .csv file: 我知道如何使用单个.csv文件执行此操作一次:

stat.cast <- dcast(solution1, sol2 ~ sol1, value.var="Istat")

But I can't seem to work it into into a for loop function or even with lapply , which seems like it could be a possible solution here too. 但我似乎无法将其for循环函数或甚至是lapply ,这似乎也可能是一个可能的解决方案。

The closest I was able to get with a for function: 最接近我能够使用for函数:

 # Get files from directory
loopout = "/Users/jc219806/Documents/Chapter 1/ANALYSES/R work/Istat/last_LoopOut/"
# List of file names inside folder
solutions <- list.files(loopout)
# Read all 2000 files inside
all.data <- lapply(solutions, read.csv, header=TRUE)
# Loop for performing reshape cast function to each listed dataframe
for (i in 1:length(all.data))
  {
  all.cast <- dcast(all.data, sol2 ~ sol1, value.var="Istat")
  }

But it keeps giving me the error that it is unable to recognise the "Istat" value from the input - even though it is there in the list of dataframes I have ("solutions" object in code above). 但它不断给我一个错误,即它无法从输入中识别出“Istat”值 - 即使它存在于我拥有的数据帧列表中(上面代码中的“解决方案”对象)。

And with the lapply function: 并具有lapply功能:

lapply(solutions, dcast(all.data, sol2 ~ sol1, value.var="Istat"))

I get the same type of error: 我得到了同样的错误:

Error: value.var (Istat) not found in input

I don't understand why because it is listed in the list of dataframes, as one of the variables in each of the 2000 dataframes. 我不明白为什么,因为它列在数据帧列表中,作为每个2000数据帧中的一个变量。 It seems like I am not getting it to loop through each of my 2000 .csv files properly, but I don't know how to fix that. 好像我没有让它循环遍历我的每个2000 .csv文件,但我不知道如何解决这个问题。 I was also wondering if it were also possible to write the code so that it loops through binding all the 2000 outputs together according to column names? 我还想知道是否也可以编写代码,以便它根据列名称循环绑定所有2000个输出? It's looping crazy. 它疯狂地循环着。

I hope this is not as complicated a problem as it seems to me to be. 我希望这不像我看来那么复杂。 Any help (along with some detailed explanations) or useful direction would be massively and sincerely appreciated. 任何帮助(以及一些详细的解释)或有用的方向将是大量和真诚的赞赏。 Thanks 谢谢

You wrote: 你写了:

for (i in 1:length(all.data))
  {
  all.cast <- dcast(all.data, sol2 ~ sol1, value.var="Istat")
  }

What you should have written: 你应该写的:

all.cast <- list()
for (i in 1:length(all.data)) {
  all.cast[[i]] <- dcast(all.data[[i]], sol2 ~ sol1, value.var = "Istat")
}

But a more "R-esque" solution would be: 但更“R-esque”的解决方案是:

all.cast <- lapply(all.data, dcast, sol2 ~ sol1, value.var = "Istat")

Hopefully this makes it clear what you did wrong. 希望这能说明你做错了什么。

"all.data" is a list of dataframes. “all.data”是一个数据帧列表。 To loop over the list, you can use lapply and an anonymous function call (just to be clear) and apply dcast on that. 要遍历列表,您可以使用lapply和匿名函数调用(只是为了清楚)并在其上应用dcast

library(reshape2)
lapply(all.data, function(x) dcast(x, sol1 ~ sol2, value.var="Istat"))

Or instead of doing individual dcast , the list can be rbind to a dataframe with a grouping variable for each list element and then either do dcast or spread from library(tidyr) 或者,而不是做个人dcast ,该列表可以rbind到数据帧与每个列表元素,然后分组变量要么做dcastspreadlibrary(tidyr)

library(dplyr)
library(tidyr)
unnest(all.data, group) %>% 
                  spread(sol2, Istat)

Or using data.table 或者使用data.table

library(data.table)
dcast(rbindlist(Map(cbind, all.data, group=seq_along(all.data))),
                 group + sol1 ~sol2, value.var='Istat')

data 数据

all.data <- structure(list(solution1 = structure(list(sol1 = c("s1", 
"s1", 
"s1", "s1"), sol2 = c("s2", "s3", "s4", "s5"), Istat = c(0.435, 
0.456, 0.845, 0.234)), .Names = c("sol1", "sol2", "Istat"), 
class =     "data.frame", row.names = c(NA, 
-4L)), solution2 = structure(list(sol1 = c("s1", "s1", "s1", 
"s1"), sol2 = c("s2", "s3", "s4", "s5"), Istat = c(0.42, 0.536, 
0.945, 0.324)), .Names = c("sol1", "sol2", "Istat"), 
class =    "data.frame", row.names = c(NA, 
-4L))), .Names = c("solution1", "solution2"))

I would melt your "all.data" list and then dcast it to a wide form. 我会melt你的“all.data”列表,然后dcast其转换成一个广泛的形式。 Something like: 就像是:

## Sample data
set1 <- set2 <- data.frame(sol1 = c("s1", "s1", "s1", "s1"), 
                   sol2 = c("s2", "s3", "s4", "s5"), 
                   Istat = c(0.435, 0.456, 0.845, 0.234))
set2$Istat <- set2$Istat + 1 ## Just to see some different data

all.data <- mget(ls(pattern = "set\\d+")) ## use your actual object

## The reshaping
library(reshape2)
dcast(melt(all.data, id.vars = c("sol1", "sol2")), 
      L1 + sol1 ~ sol2, value.var = "value")
#     L1 sol1    s2    s3    s4    s5
# 1 set1   s1 0.435 0.456 0.845 0.234
# 2 set2   s1 1.435 1.456 1.845 1.234

If your "all.data" object has names, "L1" will reflect those names, which can be quite convenient in the long run. 如果你的“all.data”对象有名字,“L1”将反映这些名字,从长远来看这可能非常方便。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM