简体   繁体   中英

Subsetting an unnamed vector in R

I'm getting a vector of numbers as output from one function, and am wanting to drop all the values higher than 2900, then pipe the remainder directly into a second function. (They'll be sorted, if that helps.) Is there a clever way to do this seemingly simple thing without having to stop and define an intermediate variable?

Here is a way without creating a temp vector.

  1. The functions f and g are simple test functions that output a sequence of integers from1 to their argument n . Function g assigns NA to half of the output vector.
  2. Function h sums its input vector.
  3. In the middle of the pipe, there's an anonymous function that subsets the output of f or g and pipes the resulting vector to function h .
  4. In the case of the pipe from g , extra code is needed to remove NA 's, if that's what the user wants.
f <- function(n) seq.int(n)
g <- function(n){
  y <- seq.int(n)
  is.na(y) <- sample(n, n/2)
  y
}
h <- function(x, na.rm = FALSE) sum(x, na.rm = na.rm)

set.seed(2022)
f(3000) |> (\(x) x[x <= 2900])() |> h()
#> [1] 4206450

set.seed(2022)
g(3000) |> (\(x) x[x <= 2900])() |> h()
#> [1] NA

set.seed(2022)
g(3000) |> (\(x) x[x <= 2900])() |> h(na.rm = TRUE)
#> [1] 2080026

set.seed(2022)
g(3000) |> (\(x) x[which(x <= 2900)])() |> h()
#> [1] 2080026

Created on 2022-03-12 by the reprex package (v2.0.1)


Edit

Following Mikael Jagan's comment , the input can be piped to the first function like below.

input <- 3000
input |> f() |> (\(x) x[x <= 2900])() |> h()
#> [1] 4206450

Created on 2022-03-12 by the reprex package (v2.0.1)

Second, simple, answer

I made it far too complicated before (below the line). Maybe there's an application where something like that is useful, but all you need is dplyr's dot:

1:10 %>% .[. < 4]
# 1 2 3

First, convoluted, answer

A generic prewritten function let's you pipe an anonymous vector in and out whenever the occasion arises.

vsubset <- function(v, condition) v[eval(str2expression(paste("v", condition)))]

1:10 %>% vsubset("<5")                                                                                                                                                                                                                        
# 1 2 3 4

To momentarily make it easier to understand, let's make three more basic versions:

equal_to <- function(v, equivalent) v[v == equivalent]
1:10 %>% equal_to(4)
# 4

less_than <- function(v, threshold) v[v < equivalent]
1:10 %>% less_than(4)
# 1 2 3

greater_than <- function(v, threshold) v[v > equivalent]
1:10 %>% less_than(4)
# 4 5 6 7 8 9 10

I prefer, though, to only have one widely-applicable function. After all, these three are very incomplete: we at least still need <= , >= and != .

To do so, we

  1. write the condition as a string (eg "==3" )
  2. combine it with the in-function name of the vector using paste()
  3. turn the string into an expression with str2expression()
  4. run the expression with eval()

There very well may be a more efficient approach than eval(str2expression(paste(..))) but this has worked for me.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM