将数据帧拆分为多个数据帧

Question

基于正则表达式搜索，我需要将数据帧拆分为多个数据帧。 搜索没有设置模式，即有时只有一个正则表达式，有时是几个的组合。 这是一个最小的例子，只提取了一组行：

Name <- c("John", "Jane", "Arthur", "Maggie")
Age <- c(20, 30, 31, 33)
City <- c("London", "Paris", "New York", "Delhi")

main_df <- data.frame(Name, Age, City)

sub_df <- main_df %>% 
  filter(grepl("J", Name))

main_df <- main_df %>% 
  filter(!grepl("J", Name))

请注意，我将一些行提取到一个新的数据帧中，然后从主数据帧中删除提取的行。

我正在寻找单行命令来执行此操作。 帮助赞赏，特别是如果使用dplyr 。

Answer 1

我们可以写一个像这样的函数

split_df <- function(df, char) {
  split(df, grepl(char, df$Name))
}

new_df <- split_df(main_df, "J")

new_df[[1]]
#    Name Age     City
#3 Arthur  31 New York
#4 Maggie  33    Delhi

new_df[[2]]
#  Name Age   City
#1 John  20 London
#2 Jane  30  Paris

代替char确保传递适当的字符以分开。 您还可以使用正则表达式char像^J （开始为J）或J$为J结束）等。

例如，

new_df <- split_df(main_df, "^J")

会给出与上面相同的输出。

Answer 2

我认为以下内容将允许您根据原始df多个条件提取行，并使用dplyr根据请求从原始df删除它们。

Name <- c("John", "Jane", "Arthur", "Maggie")
Age <- c(20, 30, 31, 33)
City <- c("London", "Paris", "New York", "Delhi")
main_df <- data.frame(Name, Age, City, stringsAsFactors = F)
conditions <- c(grepl("J",main_df$Name)) # works with several conditions as well
extractanddelete <- function(x, conditions) {
  condf <- data.frame(conditions)
  #fullcondition <- sapply(conditions, all)
  newdfs.list <- lapply(1:ncol(condf), function(i) x %>% filter(condf[,i]))
  newmain <<- x
  notcondf <- !condf
  sapply(1:ncol(condf), function(i) newmain <<- newmain %>% filter(notcondf[,i]))
  return(newdfs.list)
}
ndflist <- extractanddelete(main_df, conditions)
newmain
ndflist
> newmain
    Name Age     City
1 Arthur  31 New York
2 Maggie  33    Delhi
> ndflist
[[1]]
  Name Age   City
1 John  20 London
2 Jane  30  Paris

您会收到一个list其中包含与用于过滤和删除的条件一样多的元素。

为完整main_df <- newmain ，您可以执行main_df <- newmain

此解决方案也适用于除grepl之外的其他条件

Answer 3

我用mapply()函数实现它， mapply()函数assign()应用于多个列表（向量）参数。

注意： pos = 1是必要的

mapply(FUN = assign, x = c("main_df", "sub_df"),
                     value = split(main_df, grepl("J", main_df$Name)),
                     pos = 1)

main_df

#     Name Age     City
# 3 Arthur  31 New York
# 4 Maggie  33    Delhi

sub_df

#   Name Age   City
# 1 John  20 London
# 2 Jane  30  Paris

将数据帧拆分为多个数据帧

问题描述

3 个解决方案

解决方案1
2 已采纳 2018-10-24 10:28:52

解决方案2
1 2018-10-24 10:24:52

解决方案3
1 2018-10-24 10:44:48

将数据帧拆分为多个数据帧

问题描述

3 个解决方案

解决方案1 2 已采纳 2018-10-24 10:28:52

解决方案2 1 2018-10-24 10:24:52

解决方案3 1 2018-10-24 10:44:48

解决方案1
2 已采纳 2018-10-24 10:28:52

解决方案2
1 2018-10-24 10:24:52

解决方案3
1 2018-10-24 10:44:48