使用 purrr 按向量中的值过滤

Question

I want to write a function that accepts two arguments: a data.frame and a vector (here, called id_var ).我想编写一个接受两个参数的函数：一个 data.frame 和一个向量（这里称为id_var ）。 Then it filters the data.frame by a value that is in id_var (eg. the first value in the vector), adds the resulting data.frame to a variable called data_filt_by_var .然后，它通过滤波是一个值data.frame id_var （例如在载体中的第一个值），增加了所得到的data.frame到可变称为data_filt_by_var 。

If the number of rows in data_filt_by_var is bigger than one... It takes that same initial data.frame , filter by the same id_var value and select the distinct end (end is a the name of that is present in the data.frame), and get its number of rows.如果data_filt_by_var的行数大于一......它采用相同的初始data.frame ，通过相同的id_var值过滤并选择不同的end （结尾是存在于 data.frame 中的名称），并获取其行数。 If the number of rows is >= 1 , returns 1 , else 0 .如果行数>= 1 ，则返回1 ，否则返回0 。

The problem is, it has to do this to each value in id_var.问题是，它必须对 id_var 中的每个值执行此操作。 I cannot make this iteration work without using loops, which are not desirable.如果不使用循环，我就无法使这个迭代工作，这是不可取的。 I wrote the following function, but its not working.我写了以下函数，但它不起作用。


is_this_unique = function(data, id_var) {
  data_filt_by_var = nrow(data[data$id == id_var, ])

  if (data_filt_by_var >= 1) {
    if (nrow(data[data$id == id_var, ] %>% 
             distinct(full_address)) == 1) {
      return(1)
    }
  } else {
    return(0)
  }
}

sample_data = (tibble::tribble(~id, ~full_address,
          1,'abc',
          1,'bcd',
          1,'abc',
          2,'qaa',
          2,'xcv',
          2,'qaa'))

id_var = c(1,2)

I was hoping to use map_dbl in this function.我希望在这个函数中使用map_dbl 。

The expected output would be:预期输出将是：

input:输入：

>is_this_unique(sample_data, id_var)

desired output:所需的输出：

[1] 0 1 0 1 0 1

The first 0 is because the first id and full_address pair ( 1 and abc ) are not unique, and so on...第一个 0 是因为第一个id和full_address对（ 1和abc ）不是唯一的，依此类推...

Answer 1

The function can be written in tidyverse without using any loops with purrr .该函数可以在 tidyverse 中编写，而无需使用任何带有purrr循环。 This seems to be group_by count the frequency after filter ing for the 'id's passed into the function.这似乎是group_by对传递给函数的 'id 进行filter后计算频率。 In this case, we group by 'id', and the column that is needed (inside the curly-curly - {{}} ), create a logical column by checking the number of rows ( n() ) equal to 1. If we pass an 'idvar' that is not in the dataset, it would usually return integer(0) , which can be changed to 0 with a if/else condition at the end在这种情况下，我们按 'id' 和所需的列（在 curl-curly - {{}} ）分组，通过检查行数 ( n() ) 等于 1 创建一个逻辑列。如果我们传递一个不在数据集中的“idvar”，它通常会返回integer(0) ，可以在最后使用if/else条件将其更改为 0

library(dplyr)   
is_this_unique <- function(data, id_var, colNm) {
     out <-  data %>%
         filter(id %in% id_var) %>%
          group_by(id, {{colNm}}) %>%
          transmute(n = +(n() == 1)) %>%
          pull(n)
      if(length(out) > 0) out else 0


         }

is_this_unique(sample_data, 1:2, full_address)
#[1] 0 1 0 0 1 0

is_this_unique(sample_data, 1, full_address)
#[1] 0 1 0


is_this_unique(sample_data, 0, full_address)
#[1] 0

Answer 2

IMO using purrr here isn't suitable, you can try this function. IMO 在这里使用purrr不合适，你可以试试这个功能。

library(dplyr)  

is_this_unique <- function(data, id_var) {
   temp_data <- data %>% filter(id %in% id_var)
   if (nrow(temp_data) > 0) 
      temp_data %>% 
         add_count(id, full_address) %>%
         mutate(n = +(n == 1)) %>%
         pull(n)
   else return(0)
}

is_this_unique(sample_data, 1:2)
#[1] 0 1 0 0 1 0

is_this_unique(sample_data, 1)
#[1] 0 1 0

is_this_unique(sample_data, 0)
#[1] 0

使用 purrr 按向量中的值过滤

问题描述

2 个解决方案

解决方案1
3 已采纳 2020-01-03 00:37:34

解决方案2
2 2020-01-03 01:02:14

使用 purrr 按向量中的值过滤

问题描述

2 个解决方案

解决方案1 3 已采纳 2020-01-03 00:37:34

解决方案2 2 2020-01-03 01:02:14

解决方案1
3 已采纳 2020-01-03 00:37:34

解决方案2
2 2020-01-03 01:02:14