[英]Splitting a dataframe into several dataframes
I have a dataframe that I need split into several dataframes, based on regex searches. 基于正则表达式搜索,我需要将数据帧拆分为多个数据帧。 There is no set pattern to the searches, ie sometimes there is a single regex, sometime a combination of several. 搜索没有设置模式,即有时只有一个正则表达式,有时是几个的组合。 Here is a minimal example with just one set of rows extracted: 这是一个最小的例子,只提取了一组行:
Name <- c("John", "Jane", "Arthur", "Maggie")
Age <- c(20, 30, 31, 33)
City <- c("London", "Paris", "New York", "Delhi")
main_df <- data.frame(Name, Age, City)
sub_df <- main_df %>%
filter(grepl("J", Name))
main_df <- main_df %>%
filter(!grepl("J", Name))
Note that I am extracting some rows into a new dataframe, then deleting the extracted rows from the main dataframe. 请注意,我将一些行提取到一个新的数据帧中,然后从主数据帧中删除提取的行。
I am looking for a single line command to do this. 我正在寻找单行命令来执行此操作。 Help appreciated, especially if using dplyr
. 帮助赞赏,特别是如果使用dplyr
。
We can write a function like 我们可以写一个像这样的函数
split_df <- function(df, char) {
split(df, grepl(char, df$Name))
}
new_df <- split_df(main_df, "J")
new_df[[1]]
# Name Age City
#3 Arthur 31 New York
#4 Maggie 33 Delhi
new_df[[2]]
# Name Age City
#1 John 20 London
#2 Jane 30 Paris
In place of char
make sure to pass appropriate character to split on. 代替char
确保传递适当的字符以分开。 You can also use regex for char
like ^J
(starts with J) or J$
(ends with J) etc. 您还可以使用正则表达式char
像^J
(开始为J)或J$
为J结束)等。
For example, 例如,
new_df <- split_df(main_df, "^J")
would give same output as above. 会给出与上面相同的输出。
I think the following will allow you to extract rows based on multiple conditions from the original df
and delete them from the original, using dplyr
as requested. 我认为以下内容将允许您根据原始df
多个条件提取行,并使用dplyr
根据请求从原始df
删除它们。
Name <- c("John", "Jane", "Arthur", "Maggie")
Age <- c(20, 30, 31, 33)
City <- c("London", "Paris", "New York", "Delhi")
main_df <- data.frame(Name, Age, City, stringsAsFactors = F)
conditions <- c(grepl("J",main_df$Name)) # works with several conditions as well
extractanddelete <- function(x, conditions) {
condf <- data.frame(conditions)
#fullcondition <- sapply(conditions, all)
newdfs.list <- lapply(1:ncol(condf), function(i) x %>% filter(condf[,i]))
newmain <<- x
notcondf <- !condf
sapply(1:ncol(condf), function(i) newmain <<- newmain %>% filter(notcondf[,i]))
return(newdfs.list)
}
ndflist <- extractanddelete(main_df, conditions)
newmain
ndflist
> newmain
Name Age City
1 Arthur 31 New York
2 Maggie 33 Delhi
> ndflist
[[1]]
Name Age City
1 John 20 London
2 Jane 30 Paris
You receive a list
containing as many elements as the conditions you use for filtering and deleting. 您会收到一个list
其中包含与用于过滤和删除的条件一样多的元素。
For completeness, you can then do main_df <- newmain
为完整main_df <- newmain
,您可以执行main_df <- newmain
This solution also works for other conditions than just grepl
此解决方案也适用于除grepl
之外的其他条件
I achieve it with mapply()
function which apply function assign()
to multiple list(vector) arguments. 我用mapply()
函数实现它, mapply()
函数assign()
应用于多个列表(向量)参数。
Note: pos = 1
is necessary 注意: pos = 1
是必要的
mapply(FUN = assign, x = c("main_df", "sub_df"),
value = split(main_df, grepl("J", main_df$Name)),
pos = 1)
main_df
# Name Age City
# 3 Arthur 31 New York
# 4 Maggie 33 Delhi
sub_df
# Name Age City
# 1 John 20 London
# 2 Jane 30 Paris
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.