I have a dataframe that I need split into several dataframes, based on regex searches. There is no set pattern to the searches, ie sometimes there is a single regex, sometime a combination of several. Here is a minimal example with just one set of rows extracted:
Name <- c("John", "Jane", "Arthur", "Maggie")
Age <- c(20, 30, 31, 33)
City <- c("London", "Paris", "New York", "Delhi")
main_df <- data.frame(Name, Age, City)
sub_df <- main_df %>%
filter(grepl("J", Name))
main_df <- main_df %>%
filter(!grepl("J", Name))
Note that I am extracting some rows into a new dataframe, then deleting the extracted rows from the main dataframe.
I am looking for a single line command to do this. Help appreciated, especially if using dplyr
.
We can write a function like
split_df <- function(df, char) {
split(df, grepl(char, df$Name))
}
new_df <- split_df(main_df, "J")
new_df[[1]]
# Name Age City
#3 Arthur 31 New York
#4 Maggie 33 Delhi
new_df[[2]]
# Name Age City
#1 John 20 London
#2 Jane 30 Paris
In place of char
make sure to pass appropriate character to split on. You can also use regex for char
like ^J
(starts with J) or J$
(ends with J) etc.
For example,
new_df <- split_df(main_df, "^J")
would give same output as above.
I think the following will allow you to extract rows based on multiple conditions from the original df
and delete them from the original, using dplyr
as requested.
Name <- c("John", "Jane", "Arthur", "Maggie")
Age <- c(20, 30, 31, 33)
City <- c("London", "Paris", "New York", "Delhi")
main_df <- data.frame(Name, Age, City, stringsAsFactors = F)
conditions <- c(grepl("J",main_df$Name)) # works with several conditions as well
extractanddelete <- function(x, conditions) {
condf <- data.frame(conditions)
#fullcondition <- sapply(conditions, all)
newdfs.list <- lapply(1:ncol(condf), function(i) x %>% filter(condf[,i]))
newmain <<- x
notcondf <- !condf
sapply(1:ncol(condf), function(i) newmain <<- newmain %>% filter(notcondf[,i]))
return(newdfs.list)
}
ndflist <- extractanddelete(main_df, conditions)
newmain
ndflist
> newmain
Name Age City
1 Arthur 31 New York
2 Maggie 33 Delhi
> ndflist
[[1]]
Name Age City
1 John 20 London
2 Jane 30 Paris
You receive a list
containing as many elements as the conditions you use for filtering and deleting.
For completeness, you can then do main_df <- newmain
This solution also works for other conditions than just grepl
I achieve it with mapply()
function which apply function assign()
to multiple list(vector) arguments.
Note: pos = 1
is necessary
mapply(FUN = assign, x = c("main_df", "sub_df"),
value = split(main_df, grepl("J", main_df$Name)),
pos = 1)
main_df
# Name Age City
# 3 Arthur 31 New York
# 4 Maggie 33 Delhi
sub_df
# Name Age City
# 1 John 20 London
# 2 Jane 30 Paris
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.