[英]Remove columns that are all NA for at least one level of a factor
我希望通過刪除對於任何級別的分組因子為空的變量來整理 dataframe。 刪除完全為空的列相當容易,但是似乎沒有簡單的方法將此選擇應用於組。
## Data
site<-c("A","A","A","A","A","B","B","B","B","B")
year<-c("2000","2001","2002","2003","2004","2000","2001","2002","2003","2004")
species_A<-c(1,2,3,4,5,NA,NA,NA,NA,NA)
species_B<-c(1,2,NA,4,5,NA,3,4,5,6)
species_C<-c(1,2,3,4,5,2,3,4,5,6)
dat<-data.frame(site,year,species_A,species_B,species_C)
site year species_A species_B species_C
1 A 2000 1 1 1
2 A 2001 2 2 2
3 A 2002 3 NA 3
4 A 2003 4 4 4
5 A 2004 5 5 5
6 B 2000 NA NA 2
7 B 2001 NA 3 3
8 B 2002 NA 4 4
9 B 2003 NA 5 5
10 B 2004 NA 6 6
## Remove columns with any NAs
dat %>%
group_by(site) %>%
select(where( ~!any(is.na(.x))))
## which returns
site year species_C
<chr> <chr> <dbl>
1 A 2000 1
2 A 2001 2
3 A 2002 3
4 A 2003 4
5 A 2004 5
6 B 2000 2
7 B 2001 3
8 B 2002 4
9 B 2003 5
10 B 2004 6
## Alternatively, if i try using "all" in select it will only identify fully incomplete cases.
dat %>%
group_by(site) %>%
select(where( ~!all(is.na(.x))))
## however I am trying to get...
site year species_B species_C
1 A 2000 1 1
2 A 2001 2 2
3 A 2002 NA 3
4 A 2003 4 4
5 A 2004 5 5
6 B 2000 NA 2
7 B 2001 3 3
8 B 2002 4 4
9 B 2003 5 5
10 B 2004 6 6
看起來這應該相當簡單,但無論出於何種原因,我似乎都無法讓它發揮作用。
謝謝!
另外的選擇:
dat %>%
select(site, dat %>%
group_by(site) %>%
summarise(across(everything(), ~!all(is.na(.x))))%>%
ungroup() %>%
select(-site) %>%
select(where(all))%>%
names())
site year species_B species_C
1 A 2000 1 1
2 A 2001 2 2
3 A 2002 NA 3
4 A 2003 4 4
5 A 2004 5 5
6 B 2000 NA 2
7 B 2001 3 3
8 B 2002 4 4
9 B 2003 5 5
10 B 2004 6 6
您可以轉換為長格式,刪除變量,然后改回寬格式。
library(tidyverse)
dat %>%
tidyr::pivot_longer(!c(site, year), names_to = "species", values_to = "values") %>%
dplyr::group_by(site, species) %>%
dplyr::mutate(allNA = all(is.na(values))) %>%
dplyr::ungroup(site) %>%
dplyr::filter(!any(allNA == TRUE)) %>%
dplyr::select(-allNA) %>%
tidyr::pivot_wider(names_from = "species", values_from = "values")
Output
# A tibble: 10 × 4
site year species_B species_C
<chr> <chr> <dbl> <dbl>
1 A 2000 1 1
2 A 2001 2 2
3 A 2002 NA 3
4 A 2003 4 4
5 A 2004 5 5
6 B 2000 NA 2
7 B 2001 3 3
8 B 2002 4 4
9 B 2003 5 5
10 B 2004 6 6
我們可以按站點split
,然后使用select(where(.all(is.na(.x)))
刪除每個 dataframe 的所有 NA 列,最后通過列名的交集來子集dat
。
library(dplyr)
library(map)
dat %>% split(site) %>%
map(\(x) select(x, where(~!all(is.na(.x)))))%>%
map(names)%>%
reduce(intersect)%>%
dat[.]
site year species_B species_C
1 A 2000 1 1
2 A 2001 2 2
3 A 2002 NA 3
4 A 2003 4 4
5 A 2004 5 5
6 B 2000 NA 2
7 B 2001 3 3
8 B 2002 4 4
9 B 2003 5 5
10 B 2004 6 6
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.