I'm new in the R world, these days I have problem with the filter function of dplyr.I have a df and a I need to filter the values greater than a number, but in some rows I have multiples values (separeted by; ). For example I have this df
ID value1 value2
1 1;0;3.4 4
2 3 5
3 0.5;2;1.3 0
4 5;0.1 3
My filter is if one or more in value1 is greater or equal than 3. I use the function filter of dplyr because I need realize others filters to the df. This is my code.
filt <- df %>% filter(any(as.numeric(unlist(strsplit(value1,';',fixed=TRUE))) >=3))
But in this case the any function considers all the values of the df (not by row, as I thought), so, I obtein all df and that is not correct.
I need obtain row with id 1,2 and 4 for this example.
I think I should check by row and separate by; , but I don't know how to do this with the filter function of dplyr.
Thanks so much!
An option would be to split the 'value1' with separate_rows
from tidyr
, grouped by 'ID', filter
groups having any
element in 'value1' greater than or equal to 3, then summarise
the columns by paste
ing and getting the first
element of 'value2'
library(dplyr)
library(tidyr)
df %>%
separate_rows(value1, sep = ";", convert = TRUE) %>%
group_by(ID) %>%
filter(any(value1 >=3)) %>%
summarise(value1 = str_c(value1, collapse=";"), value2 = first(value2))
# A tibble: 3 x 3
# ID value1 value2
# <int> <chr> <int>
#1 1 1;0;3.4 4
#2 2 3 5
#3 4 5;0.1 3
Or using map
with strsplit
library(purrr)
df %>%
filter(map_lgl(strsplit(value1, ";"), ~ any(as.numeric(.x) >=3)))
# ID value1 value2
#1 1 1;0;3.4 4
#2 2 3 5
#3 4 5;0.1 3
df <- structure(list(ID = 1:4, value1 = c("1;0;3.4", "3", "0.5;2;1.3",
"5;0.1"), value2 = c(4L, 5L, 0L, 3L)), class = "data.frame", row.names = c(NA,
-4L))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.