[英]Returning data frame as main result but also informative list as side effect
I'm writing a function where I want the main output to be a data frame (that can be piped to other functions), but I also want to allow users access to an informative list or vector of samples that were omitted from the final result.我正在写一个 function ,我希望主要的 output 成为一个数据框(可以通过管道传输到其他函数),但我也希望允许用户访问信息列表或最终结果中省略的样本向量. Are there best practices for how to go about this, or examples of functions/packages that do this well?
是否有关于如何 go 的最佳实践,或者可以很好地做到这一点的函数/包的示例?
Currently I'm exploring returning the information as an attribute and throwing a warning informing users they can access the list with attr(resulting-df, "omitted")
目前我正在探索将信息作为属性返回并发出警告,通知用户他们可以使用
attr(resulting-df, "omitted")
访问列表
Any advice would be greatly appreciated, thank you!任何建议将不胜感激,谢谢!
library(dplyr)
iris <- iris %>%
mutate(index = 1:nrow(.))
return_filtered <- function(df) {
res <- filter(df, Sepal.Length > 6)
omitted <- setdiff(iris$index, res$index)
attr(res, "omitted") <- omitted
return(res)
}
iris2 <- return_filtered(iris)
attributes(iris2)
#> $names
#> [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
#> [6] "index"
#>
#> $class
#> [1] "data.frame"
#>
#> $row.names
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#> [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
#> [51] 51 52 53 54 55 56 57 58 59 60 61
#>
#> $omitted
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
#> [20] 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
#> [39] 39 40 41 42 43 44 45 46 47 48 49 50 54 56 58 60 61 62 63
#> [58] 65 67 68 70 71 79 80 81 82 83 84 85 86 89 90 91 93 94 95
#> [77] 96 97 99 100 102 107 114 115 120 122 139 143 150
Created on 2022-04-02 by the reprex package (v2.0.1)由reprex package (v2.0.1) 创建于 2022-04-02
The question is probably a little opinion-based, but I don't think it's off-topic, since there are certainly neater and more formal ways to achieve what you want than your current method.这个问题可能有点基于意见,但我认为这不是题外话,因为肯定有比当前方法更简洁、更正式的方法来实现你想要的。
It's reasonable to hold the extra information as an attribute, but if you are going to do this then it is more idiomatic and extensible to create an S3 class, so that you can hide default printing of attributes, ensure your attributes are protected, and define a getter function for the attributes so that users don't have to sift through multiple attributes to get the correct one.将额外信息作为属性保存是合理的,但如果你打算这样做,那么创建一个 S3 class 更符合习惯和可扩展性,这样你就可以隐藏属性的默认打印,确保你的属性受到保护,并定义属性的 getter function,这样用户就不必筛选多个属性来获得正确的属性。
First, we will tweak your function to work with any data frame, and allow it to take any predicate so that it works as expected with dplyr::filter
.首先,我们将调整您的 function 以使用任何数据框,并允许它采用任何谓词,以便它按预期与
dplyr::filter
一起工作。 We also get the function to add to the returned object's class attribute, so that it returns a new S3 object which inherits from data.frame
我们还得到 function 添加到返回对象的 class 属性中,以便它返回一个新的 S3 object 继承自
data.frame
return_filtered <- function(df, predicate) {
predicate <- rlang::enquo(predicate)
df$`..id..` <- seq(nrow(df))
res <- dplyr::filter(df, !!predicate)
filtered <- setdiff(seq(nrow(df)), res$`..id..`)
res$`..id..` <- NULL
attr(res, "filtered") <- filtered
class(res) <- c("filtered", class(df))
return(res)
}
We will define a print method so that the attributes don't show when we print our object:我们将定义一个打印方法,以便在打印 object 时不显示属性:
print.filtered <- function(x, ...) {
class(x) <- class(x)[class(x) != "filtered"]
print(x, ...)
}
To get the filtered-out data from the attributes, we can create a new generic function that will only work on our new class:为了从属性中获取过滤掉的数据,我们可以创建一个新的通用 function,它只适用于我们的新 class:
get_filtered <- function(x) UseMethod("get_filtered")
get_filtered.default <- function(x) {
stop("'get_filtered' only works on filtered objects")
}
get_filtered.filtered <- function(x) {
attr(x, "filtered")
}
So now, when we call return_filtered
, it seems to work the same as dplyr::filter
, returning what appears to be a normal data frame:所以现在,当我们调用
return_filtered
时,它似乎与dplyr::filter
一样工作,返回看似正常的数据框:
df <- return_filtered(iris, Sepal.Length > 7)
df
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 7.1 3.0 5.9 2.1 virginica
#> 2 7.6 3.0 6.6 2.1 virginica
#> 3 7.3 2.9 6.3 1.8 virginica
#> 4 7.2 3.6 6.1 2.5 virginica
#> 5 7.7 3.8 6.7 2.2 virginica
#> 6 7.7 2.6 6.9 2.3 virginica
#> 7 7.7 2.8 6.7 2.0 virginica
#> 8 7.2 3.2 6.0 1.8 virginica
#> 9 7.2 3.0 5.8 1.6 virginica
#> 10 7.4 2.8 6.1 1.9 virginica
#> 11 7.9 3.8 6.4 2.0 virginica
#> 12 7.7 3.0 6.1 2.3 virginica
But we can get the filtered-out data from it with our get_filtered
function.但是我们可以使用
get_filtered
function 从中获取过滤掉的数据。
get_filtered(df)
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
#> [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
#> [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
#> [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
#> [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
#> [91] 91 92 93 94 95 96 97 98 99 100 101 102 104 105 107 109 111 112
#> [109] 113 114 115 116 117 120 121 122 124 125 127 128 129 133 134 135 137 138
#> [127] 139 140 141 142 143 144 145 146 147 148 149 150
And calling get_filtered
on a non-filtered data frame returns an informative error:在未过滤的数据帧上调用
get_filtered
返回信息性错误:
get_filtered(iris)
#> Error in get_filtered.default(iris): 'get_filtered' only works on filtered objects
Created on 2022-04-02 by the reprex package (v2.0.1)由reprex package (v2.0.1) 创建于 2022-04-02
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.