简体   繁体   English

使用 purrr 按向量中的值过滤

[英]Filtering by values in vectors using purrr

I want to write a function that accepts two arguments: a data.frame and a vector (here, called id_var ).我想编写一个接受两个参数的函数:一个 data.frame 和一个向量(这里称为id_var )。 Then it filters the data.frame by a value that is in id_var (eg. the first value in the vector), adds the resulting data.frame to a variable called data_filt_by_var .然后,它通过滤波是一个值data.frame id_var (例如在载体中的第一个值),增加了所得到的data.frame到可变称为data_filt_by_var

If the number of rows in data_filt_by_var is bigger than one... It takes that same initial data.frame , filter by the same id_var value and select the distinct end (end is a the name of that is present in the data.frame), and get its number of rows.如果data_filt_by_var的行数大于一......它采用相同的初始data.frame ,通过相同的id_var值过滤并选择不同的end (结尾是存在于 data.frame 中的名称) ,并获取其行数。 If the number of rows is >= 1 , returns 1 , else 0 .如果行数>= 1 ,则返回1 ,否则返回0

The problem is, it has to do this to each value in id_var.问题是,它必须对 id_var 中的每个值执行此操作。 I cannot make this iteration work without using loops, which are not desirable.如果不使用循环,我就无法使这个迭代工作,这是不可取的。 I wrote the following function, but its not working.我写了以下函数,但它不起作用。


is_this_unique = function(data, id_var) {
  data_filt_by_var = nrow(data[data$id == id_var, ])

  if (data_filt_by_var >= 1) {
    if (nrow(data[data$id == id_var, ] %>% 
             distinct(full_address)) == 1) {
      return(1)
    }
  } else {
    return(0)
  }
}

sample_data = (tibble::tribble(~id, ~full_address,
          1,'abc',
          1,'bcd',
          1,'abc',
          2,'qaa',
          2,'xcv',
          2,'qaa'))

id_var = c(1,2)

I was hoping to use map_dbl in this function.我希望在这个函数中使用map_dbl

The expected output would be:预期输出将是:

input:输入:

>is_this_unique(sample_data, id_var)

desired output:所需的输出:

[1] 0 1 0 1 0 1

The first 0 is because the first id and full_address pair ( 1 and abc ) are not unique, and so on...第一个 0 是因为第一个idfull_address对( 1abc )不是唯一的,依此类推...

The function can be written in tidyverse without using any loops with purrr .该函数可以在 tidyverse 中编写,而无需使用任何带有purrr循环。 This seems to be group_by count the frequency after filter ing for the 'id's passed into the function.这似乎是group_by对传递给函数的 'id 进行filter后计算频率。 In this case, we group by 'id', and the column that is needed (inside the curly-curly - {{}} ), create a logical column by checking the number of rows ( n() ) equal to 1. If we pass an 'idvar' that is not in the dataset, it would usually return integer(0) , which can be changed to 0 with a if/else condition at the end在这种情况下,我们按 'id' 和所需的列(在 curl-curly - {{}} )分组,通过检查行数 ( n() ) 等于 1 创建一个逻辑列。如果我们传递一个不在数据集中的“idvar”,它通常会返回integer(0) ,可以在最后使用if/else条件将其更改为 0

library(dplyr)   
is_this_unique <- function(data, id_var, colNm) {
     out <-  data %>%
         filter(id %in% id_var) %>%
          group_by(id, {{colNm}}) %>%
          transmute(n = +(n() == 1)) %>%
          pull(n)
      if(length(out) > 0) out else 0


         }

is_this_unique(sample_data, 1:2, full_address)
#[1] 0 1 0 0 1 0

is_this_unique(sample_data, 1, full_address)
#[1] 0 1 0


is_this_unique(sample_data, 0, full_address)
#[1] 0

IMO using purrr here isn't suitable, you can try this function. IMO 在这里使用purrr不合适,你可以试试这个功能。

library(dplyr)  

is_this_unique <- function(data, id_var) {
   temp_data <- data %>% filter(id %in% id_var)
   if (nrow(temp_data) > 0) 
      temp_data %>% 
         add_count(id, full_address) %>%
         mutate(n = +(n == 1)) %>%
         pull(n)
   else return(0)
}

is_this_unique(sample_data, 1:2)
#[1] 0 1 0 0 1 0

is_this_unique(sample_data, 1)
#[1] 0 1 0

is_this_unique(sample_data, 0)
#[1] 0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM