I have dataset named DF1, which like this:
V1 V2 V3 V4 V5 V6
A01N A01N A01P Null Null Null
C09K A61K C09D C08K Null Null
A61K A61P A61P A61K A61K A61K
A01D A01D A01D A01D A01D Null
E06A Null Null Null Null Null
also a vector named V:
(A01N C09K A01D)
What I want is that subset DF1 based on the vector elements, if one row in DF1 have the elements in V, no matter in which column, then keep the row. if not, drop it. The result should be:
V1 V2 V3 V4 V5 V6
A01N A01N A01P Null Null Null
C09K A61K C09D C08K Null Null
I try to use subset(): test_t1 <- subset(DF1, DF1[,1:6] %in% V)
but I just know how to subset one column or row, how to handle multiple column?
Try with reshaping
using tidyverse
functions. You format columns to long to then compare with the vector of values. After that, filter and then reshape to wide. Here the code:
library(tidyverse)
#Data
vec <- c('A01N','C09K','A01D')
#Code
new <- df %>% mutate(id=row_number()) %>%
pivot_longer(-id) %>%
mutate(Flag=+(value%in%vec)) %>%
group_by(id) %>%
mutate(Sum=sum(Flag)) %>%
filter(Sum>=1) %>%
select(-c(Flag,Sum)) %>%
pivot_wider(names_from = name,values_from=value) %>%
ungroup %>% select(-id)
Output:
# A tibble: 3 x 6
V1 V2 V3 V4 V5 V6
<chr> <chr> <chr> <chr> <chr> <chr>
1 A01N A01N A01P Null Null Null
2 C09K A61K C09D C08K Null Null
3 A01D A01D A01D A01D A01D Null
Or using base R
with apply()
:
#Code2
new <- df[apply(df,1,function(x) ifelse(sum(x %in% vec)>=1,1,0))==1,]
Output:
V1 V2 V3 V4 V5 V6
1 A01N A01N A01P Null Null Null
2 C09K A61K C09D C08K Null Null
4 A01D A01D A01D A01D A01D Null
Some data used:
#Data
df <- structure(list(V1 = c("A01N", "C09K", "A61K", "A01D", "E06A"),
V2 = c("A01N", "A61K", "A61P", "A01D", "Null"), V3 = c("A01P",
"C09D", "A61P", "A01D", "Null"), V4 = c("Null", "C08K", "A61K",
"A01D", "Null"), V5 = c("Null", "Null", "A61K", "A01D", "Null"
), V6 = c("Null", "Null", "A61K", "Null", "Null")), class = "data.frame", row.names = c(NA,
-5L))
If too many variables are producing issues, here a more simplified version of the code (Many thanks GregorThomas ):
#Code1
new <- df %>% mutate(id=row_number()) %>%
pivot_longer(-id) %>%
group_by(id) %>%
filter(sum(value %in% vec) > 0) %>%
pivot_wider(names_from = name,values_from=value) %>%
ungroup %>% select(-id)
#Code2
new <- df[apply(df,1,function(x) sum(x %in% vec)>=1),]
This is a simple one-liner in base R:
DF1[rowSums(DF1 %in% vec) > 0, ]
An option in base R
can be
subset(DF1, Reduce(`+`, lapply(DF1, `%in%`, vec)) > 0)
-output
# V1 V2 V3 V4 V5 V6
#1 A01N A01N A01P Null Null Null
#2 C09K A61K C09D C08K Null Null
#4 A01D A01D A01D A01D A01D Null
DF1 <- structure(list(V1 = c("A01N", "C09K", "A61K", "A01D", "E06A"),
V2 = c("A01N", "A61K", "A61P", "A01D", "Null"), V3 = c("A01P",
"C09D", "A61P", "A01D", "Null"), V4 = c("Null", "C08K", "A61K",
"A01D", "Null"), V5 = c("Null", "Null", "A61K", "A01D", "Null"
), V6 = c("Null", "Null", "A61K", "Null", "Null")),
class = "data.frame", row.names = c(NA,
-5L))
vec <- c('A01N','C09K','A01D')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.