将数据帧拆分为多个数据帧

Question

I have a dataframe that I need split into several dataframes, based on regex searches. 基于正则表达式搜索，我需要将数据帧拆分为多个数据帧。 There is no set pattern to the searches, ie sometimes there is a single regex, sometime a combination of several. 搜索没有设置模式，即有时只有一个正则表达式，有时是几个的组合。 Here is a minimal example with just one set of rows extracted: 这是一个最小的例子，只提取了一组行：

Name <- c("John", "Jane", "Arthur", "Maggie")
Age <- c(20, 30, 31, 33)
City <- c("London", "Paris", "New York", "Delhi")

main_df <- data.frame(Name, Age, City)

sub_df <- main_df %>% 
  filter(grepl("J", Name))

main_df <- main_df %>% 
  filter(!grepl("J", Name))

Note that I am extracting some rows into a new dataframe, then deleting the extracted rows from the main dataframe. 请注意，我将一些行提取到一个新的数据帧中，然后从主数据帧中删除提取的行。

I am looking for a single line command to do this. 我正在寻找单行命令来执行此操作。 Help appreciated, especially if using dplyr . 帮助赞赏，特别是如果使用dplyr 。

Answer 1

We can write a function like 我们可以写一个像这样的函数

split_df <- function(df, char) {
  split(df, grepl(char, df$Name))
}

new_df <- split_df(main_df, "J")

new_df[[1]]
#    Name Age     City
#3 Arthur  31 New York
#4 Maggie  33    Delhi

new_df[[2]]
#  Name Age   City
#1 John  20 London
#2 Jane  30  Paris

In place of char make sure to pass appropriate character to split on. 代替char确保传递适当的字符以分开。 You can also use regex for char like ^J (starts with J) or J$ (ends with J) etc. 您还可以使用正则表达式char像^J （开始为J）或J$为J结束）等。

For example, 例如，

new_df <- split_df(main_df, "^J")

would give same output as above. 会给出与上面相同的输出。

Answer 2

I think the following will allow you to extract rows based on multiple conditions from the original df and delete them from the original, using dplyr as requested. 我认为以下内容将允许您根据原始df多个条件提取行，并使用dplyr根据请求从原始df删除它们。

Name <- c("John", "Jane", "Arthur", "Maggie")
Age <- c(20, 30, 31, 33)
City <- c("London", "Paris", "New York", "Delhi")
main_df <- data.frame(Name, Age, City, stringsAsFactors = F)
conditions <- c(grepl("J",main_df$Name)) # works with several conditions as well
extractanddelete <- function(x, conditions) {
  condf <- data.frame(conditions)
  #fullcondition <- sapply(conditions, all)
  newdfs.list <- lapply(1:ncol(condf), function(i) x %>% filter(condf[,i]))
  newmain <<- x
  notcondf <- !condf
  sapply(1:ncol(condf), function(i) newmain <<- newmain %>% filter(notcondf[,i]))
  return(newdfs.list)
}
ndflist <- extractanddelete(main_df, conditions)
newmain
ndflist
> newmain
    Name Age     City
1 Arthur  31 New York
2 Maggie  33    Delhi
> ndflist
[[1]]
  Name Age   City
1 John  20 London
2 Jane  30  Paris

You receive a list containing as many elements as the conditions you use for filtering and deleting. 您会收到一个list其中包含与用于过滤和删除的条件一样多的元素。

For completeness, you can then do main_df <- newmain 为完整main_df <- newmain ，您可以执行main_df <- newmain

This solution also works for other conditions than just grepl 此解决方案也适用于除grepl之外的其他条件

Answer 3

I achieve it with mapply() function which apply function assign() to multiple list(vector) arguments. 我用mapply()函数实现它， mapply()函数assign()应用于多个列表（向量）参数。

Note: pos = 1 is necessary 注意： pos = 1是必要的

mapply(FUN = assign, x = c("main_df", "sub_df"),
                     value = split(main_df, grepl("J", main_df$Name)),
                     pos = 1)

main_df

#     Name Age     City
# 3 Arthur  31 New York
# 4 Maggie  33    Delhi

sub_df

#   Name Age   City
# 1 John  20 London
# 2 Jane  30  Paris

将数据帧拆分为多个数据帧

问题描述

3 个解决方案

解决方案1
2 已采纳 2018-10-24 10:28:52

解决方案2
1 2018-10-24 10:24:52

解决方案3
1 2018-10-24 10:44:48

将数据帧拆分为多个数据帧

问题描述

3 个解决方案

解决方案1 2 已采纳 2018-10-24 10:28:52

解决方案2 1 2018-10-24 10:24:52

解决方案3 1 2018-10-24 10:44:48

解决方案1
2 已采纳 2018-10-24 10:28:52

解决方案2
1 2018-10-24 10:24:52

解决方案3
1 2018-10-24 10:44:48