简体   繁体   English

创建一个新行,其中包含列表中每个数据框的列总和

[英]Create a new row containing column sums for every data frame in a list

I have a list of multiple data frames. 我有多个数据框的列表。 Example data: 示例数据:

df1 <- data.frame(Name=c("A", "B", "C"), E1=c(0, NA, 1), E2=c(1, 0, 1))
df2 <- data.frame(Name=c("A", "C", "F"), E1=c(1, 0, 1), E2=c(0, 0, 0))
ls <- list(df1, df2)

For each data frame, I'd like to create a new row at the bottom containing the sum of each column. 对于每个数据框,我想在底部创建一个新行,其中包含每一列的总和。 So for df1 is would look like this: 因此,对于df1来说将如下所示:

Name E1 E2
"A"  0  1
"B"  NA 0
"C"  1  1
Sum  1  2

This is what I tried: 这是我尝试的:

ls <- lapply(ls, function(x) {
  x[nrow(x)+1, -1] <- colSums(x[,-1], na.rm=TRUE)
})

I received the following error message: 我收到以下错误消息:

Error in colSums(x[,-1], na.rm = TRUE) : 'x' must be numeric

All of my columns except "Names" contain just 1's, 0's, and NA's, so I thought that maybe they're being read as factors instead of numeric. 我除“名称”外的所有列均仅包含1、0和NA,因此我认为也许它们被当作因子而不是数字来读取。 My first attempt to coerce to numeric (which looked like the function below but without "unlist") resulted in an error (object type list cannot be coerced to type 'double') so I tried the following based on the answer in this other post: 我第一次尝试强制转换为数值(看起来像下面的函数,但没有“ unlist”)导致错误(对象类型列表不能被强制转换为“ double”类型),因此我根据一篇文章中的答案尝试了以下操作

ls <- lapply(ls, function(x) {
  x[,-1] <- as.numeric(unlist(x[,-1]))
})

But that just gives me a list of numeric strings, not a list of data frames like I want. 但这只是给我一个数字字符串列表,而不是我想要的数据帧列表。 Any advice on either fixing my original colSums function or successfully converting my data to numeric would be greatly appreciated! 任何有关修复我原始的colSums函数或成功将我的数据转换为数值的建议,将不胜感激!

You are very close! 你很亲密! Your current function is only returning the last row, because functions by default return whatever object is on the last line. 您当前的函数仅返回最后一行,因为默认情况下函数会返回最后一行上的任何对象。 So you need something like the following. 因此,您需要类似以下内容的东西。 as.character is because the strings were inputted as factor, which wouldn't let you put "Sum" into the frame the right way. as.character是因为字符串是作为因素输入的,因此您不能以正确的方式将"Sum"放入框架中。

In general though, unless this is for some kind of output storing summary stats as a row inside the table is not a very tidy practice, because it can become confusing having some rows contain data and others not. 但是,总的来说,除非这是用于某种形式的输出,否则将摘要统计信息存储为表内的行不是一个很整洁的做法,因为如果某些行包含数据而其他行不包含数据,则可能会造成混乱。

df1 <- data.frame(Name=c("A", "B", "C"), E1=c(0, NA, 1), E2=c(1, 0, 1))
df2 <- data.frame(Name=c("A", "C", "F"), E1=c(1, 0, 1), E2=c(0, 0, 0))
ls <- list(df1, df2)

lapply(ls, function(x) {
  x[nrow(x)+1, -1] <- colSums(x[,-1], na.rm=TRUE)
  x[, 1] <- as.character(x[, 1])
  x[nrow(x), 1] <- "Sum"
  return(x)
})
#> [[1]]
#>   Name E1 E2
#> 1    A  0  1
#> 2    B NA  0
#> 3    C  1  1
#> 4  Sum  1  2
#> 
#> [[2]]
#>   Name E1 E2
#> 1    A  1  0
#> 2    C  0  0
#> 3    F  1  0
#> 4  Sum  2  0

Created on 2018-03-16 by the reprex package (v0.2.0). reprex软件包 (v0.2.0)创建于2018-03-16。

For the sake of completeness, here is also a data.table solution. 为了完整起见,这也是一个data.table解决方案。 data.table is much more tolerant when adding character values to a factor column. 将字符值添加到因子列时, data.table要高得多。 No explicit type conversion is required. 不需要显式的类型转换。

In addition, I want to suggest an alternative to "list of data.frames". 另外,我想提出一个替代“ data.frames列表”的方法。

library(data.table)
lapply(ls, function(x) rbind(setDT(x),  
  x[, c(.(Name = "sum"), lapply(.SD, sum, na.rm = TRUE)), .SDcols = c("E1", "E2")]
))
  Name E1 E2 1: A 0 1 2: B NA 0 3: C 1 1 4: sum 1 2 [[2]] Name E1 E2 1: A 1 0 2: C 0 0 3: F 1 0 4: sum 2 0 

The Name columns are still factors but with an additional factor level as can been seen by applying str() to the result: Name列仍然是因子,但是具有附加因子级别,可以通过将str()应用于结果来看到:

 List of 2 $ :Classes 'data.table' and 'data.frame': 4 obs. of 3 variables: ..$ Name: Factor w/ 4 levels "A","B","C","sum": 1 2 3 4 ..$ E1 : num [1:4] 0 NA 1 1 ..$ E2 : num [1:4] 1 0 1 2 ..- attr(*, ".internal.selfref")=<externalptr> $ :Classes 'data.table' and 'data.frame': 4 obs. of 3 variables: ..$ Name: Factor w/ 4 levels "A","C","F","sum": 1 2 3 4 ..$ E1 : num [1:4] 1 0 1 2 ..$ E2 : num [1:4] 0 0 0 0 ..- attr(*, ".internal.selfref")=<externalptr> 

Alternative to list of data.frames 替代data.frames列表

If the data.frames in the list all have the same structure, ie, the same number, type and name of columns, then I prefer to store the data in one object: 如果列表中的data.frames都具有相同的结构,即列的编号,类型和名称相同,那么我更喜欢将数据存储在一个对象中:

library(data.table)
DT <- rbindlist(ls, idcol = "df.id")
DT
  df.id Name E1 E2 1: 1 A 0 1 2: 1 B NA 0 3: 1 C 1 1 4: 2 A 1 0 5: 2 C 0 0 6: 2 F 1 0 

The origin of each row is identified by the number in df.id . 每行的起点由df.id的数字df.id Now, we can use grouping instead of looping through the elements of the list, eg, 现在,我们可以使用分组,而不是遍历列表的元素,例如,

DT[, lapply(.SD, sum, na.rm = TRUE), .SDcols = c("E1", "E2"), by = df.id]
  df.id E1 E2 1: 1 1 2 2: 2 2 0 

Or, if the sum rows are to be interspersed within the original data: 或者,如果要将sum行散布在原始数据中:

rbind(
  DT,
  DT[, c(.(Name = "sum"), lapply(.SD, sum, na.rm = TRUE)), .SDcols = c("E1", "E2"), by = df.id]
)[order(df.id)]
  df.id Name E1 E2 1: 1 A 0 1 2: 1 B NA 0 3: 1 C 1 1 4: 1 sum 1 2 5: 2 A 1 0 6: 2 C 0 0 7: 2 F 1 0 8: 2 sum 2 0 

Another option could be by using rbind and Map as: 另一种选择是通过使用rbindMap作为:

Map(rbind, ls, lapply(ls, 
        function(x)sapply(x, 
         function(x)if(class(x) == "character"){ "Sum:" }else{ sum(x, na.rm = TRUE)})))
# [[1]]
# Name   E1 E2
# 1    A    0  1
# 2    B <NA>  0
# 3    C    1  1
# 4 Sum:    1  2
# 
# [[2]]
# Name E1 E2
# 1    A  1  0
# 2    C  0  0
# 3    F  1  0
# 4 Sum:  2  0

Data 数据

Note: The Name column has been changed to 'character` for above solution. 注意:以上解决方案的“ Name列已更改为“字符”。

df1 <- data.frame(Name=c("A", "B", "C"), E1=c(0, NA, 1), E2=c(1, 0, 1),
        stringsAsFactors = FALSE)
df2 <- data.frame(Name=c("A", "C", "F"), E1=c(1, 0, 1), E2=c(0, 0, 0),
        stringsAsFactors = FALSE)
ls <- list(df1, df2)
lapply(ls,function(i) 
data.frame(rbind(apply(i,2,as.vector),c("Sum",colSums(i[,-1],na.rm = TRUE) ))))

You could use rbind : 您可以使用rbind

df1 <- data.frame(Name=c("A", "B", "C"), E1=c(0, NA, 1), E2=c(1, 0, 1), stringsAsFactors = FALSE)
df2 <- data.frame(Name=c("A", "C", "F"), E1=c(1, 0, 1), E2=c(0, 0, 0), stringsAsFactors = FALSE)
ls <- list(df1, df2)

ls <- lapply(ls, function(x) {
  x <- rbind(x, c(
    "Sum", 
    sum(x[, "E1"], na.rm = TRUE),
    sum(x[, "E2"], na.rm = TRUE)))
})
ls

Which yields 哪个产量

[[1]]
  Name   E1 E2
1    A    0  1
2    B <NA>  0
3    C    1  1
4  Sum    1  2

[[2]]
  Name E1 E2
1    A  1  0
2    C  0  0
3    F  1  0
4  Sum  2  0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM