简体   繁体   English

返回数据框作为主要结果,但也返回信息列表作为副作用

[英]Returning data frame as main result but also informative list as side effect

I'm writing a function where I want the main output to be a data frame (that can be piped to other functions), but I also want to allow users access to an informative list or vector of samples that were omitted from the final result.我正在写一个 function ,我希望主要的 output 成为一个数据框(可以通过管道传输到其他函数),但我也希望允许用户访问信息列表或最终结果中省略的样本向量. Are there best practices for how to go about this, or examples of functions/packages that do this well?是否有关于如何 go 的最佳实践,或者可以很好地做到这一点的函数/包的示例?

Currently I'm exploring returning the information as an attribute and throwing a warning informing users they can access the list with attr(resulting-df, "omitted")目前我正在探索将信息作为属性返回并发出警告,通知用户他们可以使用attr(resulting-df, "omitted")访问列表

Any advice would be greatly appreciated, thank you!任何建议将不胜感激,谢谢!

library(dplyr)

iris <- iris %>%
  mutate(index = 1:nrow(.))

return_filtered <- function(df) {

  res <- filter(df, Sepal.Length > 6)
  omitted <- setdiff(iris$index, res$index)

  attr(res, "omitted") <- omitted
  return(res)

}

iris2 <- return_filtered(iris)
attributes(iris2)
#> $names
#> [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"     
#> [6] "index"       
#> 
#> $class
#> [1] "data.frame"
#> 
#> $row.names
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#> [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
#> [51] 51 52 53 54 55 56 57 58 59 60 61
#> 
#> $omitted
#>  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19
#> [20]  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38
#> [39]  39  40  41  42  43  44  45  46  47  48  49  50  54  56  58  60  61  62  63
#> [58]  65  67  68  70  71  79  80  81  82  83  84  85  86  89  90  91  93  94  95
#> [77]  96  97  99 100 102 107 114 115 120 122 139 143 150

Created on 2022-04-02 by the reprex package (v2.0.1)reprex package (v2.0.1) 创建于 2022-04-02

The question is probably a little opinion-based, but I don't think it's off-topic, since there are certainly neater and more formal ways to achieve what you want than your current method.这个问题可能有点基于意见,但我认为这不是题外话,因为肯定有比当前方法更简洁、更正式的方法来实现你想要的。

It's reasonable to hold the extra information as an attribute, but if you are going to do this then it is more idiomatic and extensible to create an S3 class, so that you can hide default printing of attributes, ensure your attributes are protected, and define a getter function for the attributes so that users don't have to sift through multiple attributes to get the correct one.将额外信息作为属性保存是合理的,但如果你打算这样做,那么创建一个 S3 class 更符合习惯和可扩展性,这样你就可以隐藏属性的默认打印,确保你的属性受到保护,并定义属性的 getter function,这样用户就不必筛选多个属性来获得正确的属性。

First, we will tweak your function to work with any data frame, and allow it to take any predicate so that it works as expected with dplyr::filter .首先,我们将调整您的 function 以使用任何数据框,并允许它采用任何谓词,以便它按预期与dplyr::filter一起工作。 We also get the function to add to the returned object's class attribute, so that it returns a new S3 object which inherits from data.frame我们还得到 function 添加到返回对象的 class 属性中,以便它返回一个新的 S3 object 继承自data.frame

return_filtered <- function(df, predicate) {
  predicate    <- rlang::enquo(predicate)
  df$`..id..`  <- seq(nrow(df))
  res          <- dplyr::filter(df, !!predicate)
  filtered     <- setdiff(seq(nrow(df)), res$`..id..`)
  res$`..id..` <- NULL
  
  attr(res, "filtered") <- filtered
  class(res)            <- c("filtered", class(df))
  
  return(res)
}

We will define a print method so that the attributes don't show when we print our object:我们将定义一个打印方法,以便在打印 object 时不显示属性:

print.filtered <- function(x, ...) {
  class(x) <- class(x)[class(x) != "filtered"]
  print(x, ...)
}

To get the filtered-out data from the attributes, we can create a new generic function that will only work on our new class:为了从属性中获取过滤掉的数据,我们可以创建一个新的通用 function,它只适用于我们的新 class:

get_filtered <- function(x) UseMethod("get_filtered")

get_filtered.default <- function(x) {
  stop("'get_filtered' only works on filtered objects")
}

get_filtered.filtered <- function(x) {
  attr(x, "filtered")
}

So now, when we call return_filtered , it seems to work the same as dplyr::filter , returning what appears to be a normal data frame:所以现在,当我们调用return_filtered时,它似乎与dplyr::filter一样工作,返回看似正常的数据框:

df <- return_filtered(iris, Sepal.Length > 7)

df
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#> 1           7.1         3.0          5.9         2.1 virginica
#> 2           7.6         3.0          6.6         2.1 virginica
#> 3           7.3         2.9          6.3         1.8 virginica
#> 4           7.2         3.6          6.1         2.5 virginica
#> 5           7.7         3.8          6.7         2.2 virginica
#> 6           7.7         2.6          6.9         2.3 virginica
#> 7           7.7         2.8          6.7         2.0 virginica
#> 8           7.2         3.2          6.0         1.8 virginica
#> 9           7.2         3.0          5.8         1.6 virginica
#> 10          7.4         2.8          6.1         1.9 virginica
#> 11          7.9         3.8          6.4         2.0 virginica
#> 12          7.7         3.0          6.1         2.3 virginica

But we can get the filtered-out data from it with our get_filtered function.但是我们可以使用get_filtered function 从中获取过滤掉的数据。

get_filtered(df)
#>   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
#>  [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
#>  [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
#>  [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
#>  [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
#>  [91]  91  92  93  94  95  96  97  98  99 100 101 102 104 105 107 109 111 112
#> [109] 113 114 115 116 117 120 121 122 124 125 127 128 129 133 134 135 137 138
#> [127] 139 140 141 142 143 144 145 146 147 148 149 150

And calling get_filtered on a non-filtered data frame returns an informative error:在未过滤的数据帧上调用get_filtered返回信息性错误:

get_filtered(iris)
#> Error in get_filtered.default(iris): 'get_filtered' only works on filtered objects

Created on 2022-04-02 by the reprex package (v2.0.1)reprex package (v2.0.1) 创建于 2022-04-02

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R,数据框列表,lapply后,结果存储也作为数据框列表,如何让结果存储在源名称中 - R, a list of data frame, after lapply, the result store also as a list of data frame, how to let the results store in the origin name 对R dplyr突变分配给data.frame的负面影响 - negative side effect on R dplyr mutate assignment to data.frame 数据帧列表的功能,还返回R中的数据帧 - Function of a list of data frames, to return also a data frame in R r 合并命令返回列表,而不是数据框 - r merge command returning list, not data frame 将列表列表转换为数据框但参考主列表 - convert list of list into data frame but referring to main list R:将列表中每个列表的元素求和,然后在数据框中返回结果 - R: sum the elements of each list on a list and return the result in a data frame 检查data.frame(x)中的观测值是否也存在于data.frame(y)中并根据结果填充新列的最聪明方法 - Smartest way to check if an observation in data.frame(x) exists also in data.frame(y) and populate a new column according with the result 获取数据帧的两列之间的顺序匹配并返回列表 - Obtaining sequential matches between two columns of a data frame returning a list 按列子集数据框并返回这些子集的列表 - Subsetting a data frame by columns and returning a list of those subsets lapply之后从列表返回到data.frame - returning from list to data.frame after lapply
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM