[英]Filtering by values in vectors using purrr
I want to write a function that accepts two arguments: a data.frame and a vector (here, called id_var
).我想编写一个接受两个参数的函数:一个 data.frame 和一个向量(这里称为id_var
)。 Then it filters the data.frame by a value that is in id_var
(eg. the first value in the vector), adds the resulting data.frame to a variable called data_filt_by_var
.然后,它通过滤波是一个值data.frame id_var
(例如在载体中的第一个值),增加了所得到的data.frame到可变称为data_filt_by_var
。
If the number of rows in data_filt_by_var
is bigger than one... It takes that same initial data.frame
, filter by the same id_var
value and select the distinct end
(end is a the name of that is present in the data.frame), and get its number of rows.如果data_filt_by_var
的行数大于一......它采用相同的初始data.frame
,通过相同的id_var
值过滤并选择不同的end
(结尾是存在于 data.frame 中的名称) ,并获取其行数。 If the number of rows is >= 1
, returns 1
, else 0
.如果行数>= 1
,则返回1
,否则返回0
。
The problem is, it has to do this to each value in id_var.问题是,它必须对 id_var 中的每个值执行此操作。 I cannot make this iteration work without using loops, which are not desirable.如果不使用循环,我就无法使这个迭代工作,这是不可取的。 I wrote the following function, but its not working.我写了以下函数,但它不起作用。
is_this_unique = function(data, id_var) {
data_filt_by_var = nrow(data[data$id == id_var, ])
if (data_filt_by_var >= 1) {
if (nrow(data[data$id == id_var, ] %>%
distinct(full_address)) == 1) {
return(1)
}
} else {
return(0)
}
}
sample_data = (tibble::tribble(~id, ~full_address,
1,'abc',
1,'bcd',
1,'abc',
2,'qaa',
2,'xcv',
2,'qaa'))
id_var = c(1,2)
I was hoping to use map_dbl
in this function.我希望在这个函数中使用map_dbl
。
The expected output would be:预期输出将是:
input:输入:
>is_this_unique(sample_data, id_var)
desired output:所需的输出:
[1] 0 1 0 1 0 1
The first 0 is because the first id
and full_address
pair ( 1
and abc
) are not unique, and so on...第一个 0 是因为第一个id
和full_address
对( 1
和abc
)不是唯一的,依此类推...
The function can be written in tidyverse without using any loops with purrr
.该函数可以在 tidyverse 中编写,而无需使用任何带有purrr
循环。 This seems to be group_by
count the frequency after filter
ing for the 'id's passed into the function.这似乎是group_by
对传递给函数的 'id 进行filter
后计算频率。 In this case, we group by 'id', and the column that is needed (inside the curly-curly - {{}}
), create a logical column by checking the number of rows ( n()
) equal to 1. If we pass an 'idvar' that is not in the dataset, it would usually return integer(0)
, which can be changed to 0 with a if/else
condition at the end在这种情况下,我们按 'id' 和所需的列(在 curl-curly - {{}}
)分组,通过检查行数 ( n()
) 等于 1 创建一个逻辑列。如果我们传递一个不在数据集中的“idvar”,它通常会返回integer(0)
,可以在最后使用if/else
条件将其更改为 0
library(dplyr)
is_this_unique <- function(data, id_var, colNm) {
out <- data %>%
filter(id %in% id_var) %>%
group_by(id, {{colNm}}) %>%
transmute(n = +(n() == 1)) %>%
pull(n)
if(length(out) > 0) out else 0
}
is_this_unique(sample_data, 1:2, full_address)
#[1] 0 1 0 0 1 0
is_this_unique(sample_data, 1, full_address)
#[1] 0 1 0
is_this_unique(sample_data, 0, full_address)
#[1] 0
IMO using purrr
here isn't suitable, you can try this function. IMO 在这里使用purrr
不合适,你可以试试这个功能。
library(dplyr)
is_this_unique <- function(data, id_var) {
temp_data <- data %>% filter(id %in% id_var)
if (nrow(temp_data) > 0)
temp_data %>%
add_count(id, full_address) %>%
mutate(n = +(n == 1)) %>%
pull(n)
else return(0)
}
is_this_unique(sample_data, 1:2)
#[1] 0 1 0 0 1 0
is_this_unique(sample_data, 1)
#[1] 0 1 0
is_this_unique(sample_data, 0)
#[1] 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.