簡體   English   中英

刪除對於至少一個因子級別全部為 NA 的列

[英]Remove columns that are all NA for at least one level of a factor

我希望通過刪除對於任何級別的分組因子為空的變量來整理 dataframe。 刪除完全為空的列相當容易,但是似乎沒有簡單的方法將此選擇應用於組。

## Data

site<-c("A","A","A","A","A","B","B","B","B","B")
year<-c("2000","2001","2002","2003","2004","2000","2001","2002","2003","2004")
species_A<-c(1,2,3,4,5,NA,NA,NA,NA,NA)
species_B<-c(1,2,NA,4,5,NA,3,4,5,6)
species_C<-c(1,2,3,4,5,2,3,4,5,6)

dat<-data.frame(site,year,species_A,species_B,species_C)


  site year species_A species_B species_C
1     A 2000         1         1         1
2     A 2001         2         2         2
3     A 2002         3        NA         3
4     A 2003         4         4         4
5     A 2004         5         5         5
6     B 2000        NA        NA         2
7     B 2001        NA         3         3
8     B 2002        NA         4         4
9     B 2003        NA         5         5
10    B 2004        NA         6         6
 


## Remove columns with any NAs

dat %>% 
  group_by(site) %>%
  select(where( ~!any(is.na(.x))))


## which returns 

   site  year  species_C
   <chr> <chr>     <dbl>
 1 A     2000          1
 2 A     2001          2
 3 A     2002          3
 4 A     2003          4
 5 A     2004          5
 6 B     2000          2
 7 B     2001          3
 8 B     2002          4
 9 B     2003          5
10 B     2004          6



## Alternatively, if i try using "all" in select it will only identify fully incomplete cases.

dat %>% 
  group_by(site) %>%
  select(where( ~!all(is.na(.x))))



## however I am trying to get...
 
   site year species_B species_C
1     A 2000         1         1
2     A 2001         2         2
3     A 2002        NA         3
4     A 2003         4         4
5     A 2004         5         5
6     B 2000        NA         2
7     B 2001         3         3
8     B 2002         4         4
9     B 2003         5         5
10    B 2004         6         6

看起來這應該相當簡單,但無論出於何種原因,我似乎都無法讓它發揮作用。

謝謝!

另外的選擇:

 dat %>%
  select(site, dat %>%
  group_by(site) %>%
  summarise(across(everything(), ~!all(is.na(.x))))%>%
  ungroup() %>%
  select(-site) %>%
  select(where(all))%>%
  names())

   site year species_B species_C
1     A 2000         1         1
2     A 2001         2         2
3     A 2002        NA         3
4     A 2003         4         4
5     A 2004         5         5
6     B 2000        NA         2
7     B 2001         3         3
8     B 2002         4         4
9     B 2003         5         5
10    B 2004         6         6

您可以轉換為長格式,刪除變量,然后改回寬格式。

library(tidyverse)

dat %>%
  tidyr::pivot_longer(!c(site, year), names_to = "species", values_to = "values") %>%
  dplyr::group_by(site, species) %>%
  dplyr::mutate(allNA = all(is.na(values))) %>%
  dplyr::ungroup(site) %>%
  dplyr::filter(!any(allNA == TRUE)) %>%
  dplyr::select(-allNA) %>%
  tidyr::pivot_wider(names_from = "species", values_from = "values")

Output

# A tibble: 10 × 4
   site  year  species_B species_C
   <chr> <chr>     <dbl>     <dbl>
 1 A     2000          1         1
 2 A     2001          2         2
 3 A     2002         NA         3
 4 A     2003          4         4
 5 A     2004          5         5
 6 B     2000         NA         2
 7 B     2001          3         3
 8 B     2002          4         4
 9 B     2003          5         5
10 B     2004          6         6

我們可以按站點split ,然后使用select(where(.all(is.na(.x)))刪除每個 dataframe 的所有 NA 列,最后通過列名的交集來子集dat

library(dplyr)
library(map)

dat %>% split(site) %>%
    map(\(x) select(x, where(~!all(is.na(.x)))))%>%
    map(names)%>%
    reduce(intersect)%>%
    dat[.]

   site year species_B species_C
1     A 2000         1         1
2     A 2001         2         2
3     A 2002        NA         3
4     A 2003         4         4
5     A 2004         5         5
6     B 2000        NA         2
7     B 2001         3         3
8     B 2002         4         4
9     B 2003         5         5
10    B 2004         6         6

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM