In a dataframe, I only want to keep rows that have at least one variables starting with DSDECOD is NOT empty. How can I do that?
it seems that following code works.
ds_sub <- subset(ds_supp, (DSDECOD1 !="" | DSDECOD2 !="" |
DSDECOD3 !="" | DSDECOD4 !=""))
But is there simple way so that I don't have to write out all of the variables starting with DSDECOD?
Maybe using rowSums
and grepl
:
ds_supp[rowSums(ds_supp[, grepl("^DSDECOD", names(ds_supp))]!="")>0,]
ID DSDECOD1 DSDECOD2 DSDECOD3 DSDECOD4
1 1 B
2 2 A A A
3 3 B B
5 5 C C C
6 6 D D
Data :
ID DSDECOD1 DSDECOD2 DSDECOD3 DSDECOD4
1 1 B
2 2 A A A
3 3 B B
4 4 # <- empty row
5 5 C C C
6 6 D D
You could try using select
and the remove_empty
function from the janitor
package?
ds_sub %>%
select(contains("DSDECOD")) %>%
janitor::remove_empty(.)
This regex solution works:
df[-which(grepl("\\d$", apply(df, 1, paste0, collapse = ""))),]
id DSDECOD1 DSDECOD2 DSDECOD3
1 1 A
2 2 B
3 3 A
4 4 B
8 8 A
9 9 B
10 10 A
This solution works by paste0
ing the rows together and then subtracting from the dataframe those strings which
end ( $
) on a digit ( \\\\d
), which happens only when the DSDECOD
rows are empty:
Reproducible data :
df <- data.frame(
id = 1:10,
DSDECOD1 = c("", "B", rep("",8)),
DSDECOD2 = c("A","","","B","","","","","",""),
DSDECOD3 = c("", "", "A", "","","","", "A", "B", "A"))
df
id DSDECOD1 DSDECOD2 DSDECOD3
1 1 A
2 2 B
3 3 A
4 4 B
5 5 # empty
6 6 # empty
7 7 # empty
8 8 A
9 9 B
10 10 A
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.