[英]How to apply function to pairs of columns with purrr?
I have the following dataframe resulting from a join with dplyr: 我有以下与dplyr联接产生的数据框:
data_frame(id=1:4, a.x = c(1, NA, 3, 4), a.y = c(1, 2, 3, 4), b.x = c(NA, NA, 3, NA), b.y = c(2, 2, NA, 4))
# A tibble: 4 x 5
id a.x a.y b.x b.y
<int> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 NA 2
2 2 NA 2 NA 2
3 3 3 3 3 NA
4 4 4 4 NA 4
And I would like to replace all the NAs in the columns ending with .x
with the value from the columns ending with .y
. 我想用以.y
结尾的列中的值替换以.x
结尾的列中的所有NA。 Eventually, I would like to achieve this: 最终,我想实现以下目标:
# A tibble: 4 x 5
id a.x a.y b.x b.y
<int> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 2 2
2 2 2 2 2 2
3 3 3 3 3 NA
4 4 4 4 4 4
I tried with purrr something like this: 我用purrr尝试了这样的事情:
data_frame(id=1:4, a.x = c(1, NA, 3, 4), a.y = c(1, 2, 3, 4), b.x = c(NA, NA, 3, NA), b.y = c(2, 2, NA, 4)) %>%
map2_dfr(.x = ends_with('.y'), .y = ends_with('.x'), ~ case_when(is.na(.x) ~ .y,
TRUE ~ .x))
Which is wrong. 哪有错 The documentation is a bit confusing to me, I think the issue here is that .x expects a vector, but how can I pass a list of columns then? 文档对我来说有些混乱,我认为这里的问题是.x需要一个向量,但是我怎样才能传递列列表呢?
A tidyr solution. 泰迪解决方案。 We can gather
the columns, separate
by .
我们可以gather
以separate
的列.
, arrange
by columns, fill
the value toward up, unite
columns, and finally spread
the data frame to the original structure. ,按列arrange
,向上fill
值, unite
列,最后将数据帧spread
到原始结构。
library(tidyverse)
dat2 <- dat %>%
gather(Column, Value, -id) %>%
separate(Column, into = c("Col1", "Col2")) %>%
arrange(id, Col1, Col2) %>%
group_by(id, Col1) %>%
fill(Value, .direction = "up") %>%
unite(Column, Col1, Col2, sep = ".") %>%
spread(Column, Value) %>%
ungroup()
dat2
## A tibble: 4 x 5
# id a.x a.y b.x b.y
# * <int> <dbl> <dbl> <dbl> <dbl>
# 1 1 1.00 1.00 2.00 2.00
# 2 2 2.00 2.00 2.00 2.00
# 3 3 3.00 3.00 3.00 NA
# 4 4 4.00 4.00 4.00 4.00
Or if the order of the columns in the data frame is good, we can use the transpose
function from the data.table package, but be careful that the column types may change after the process. 或者,如果数据框中各列的顺序正确,则可以使用data.table包中的transpose
函数,但请注意,此过程后列类型可能会更改。
dat2 <- dat %>%
data.table::transpose() %>%
fill(everything(), .direction = 'up') %>%
data.table::transpose() %>%
setNames(names(dat))
dat2
# id a.x a.y b.x b.y
# 1 1 1 1 2 2
# 2 2 2 2 2 2
# 3 3 3 3 3 NA
# 4 4 4 4 4 4
Or a solution using purrr to create subset that with column names ends_with
"x" and "y" first, and then replace the original columns ends with "x". 或使用purrr的解决方案来创建子集,该子集的列名称ends_with
“ x”和“ y”,然后将原始列替换为“ x”。
dat_x <- dat %>% select(ends_with("x"))
dat_y <- dat %>% select(ends_with("y"))
dat[, grepl("x$", names(dat))] <- map2(dat_x, dat_y, ~ifelse(is.na(.x), .y, .x))
dat
# # A tibble: 4 x 5
# id a.x a.y b.x b.y
# <int> <dbl> <dbl> <dbl> <dbl>
# 1 1 1.00 1.00 2.00 2.00
# 2 2 2.00 2.00 2.00 2.00
# 3 3 3.00 3.00 3.00 NA
# 4 4 4.00 4.00 4.00 4.00
DATA 数据
dat <- data_frame(id=1:4, a.x = c(1, NA, 3, 4), a.y = c(1, 2, 3, 4), b.x = c(NA, NA, 3, NA), b.y = c(2, 2, NA, 4))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.