简体   繁体   English

将数据帧拆分为多个数据帧

[英]Splitting a dataframe into several dataframes

I have a dataframe that I need split into several dataframes, based on regex searches. 基于正则表达式搜索,我需要将数据帧拆分为多个数据帧。 There is no set pattern to the searches, ie sometimes there is a single regex, sometime a combination of several. 搜索没有设置模式,即有时只有一个正则表达式,有时是几个的组合。 Here is a minimal example with just one set of rows extracted: 这是一个最小的例子,只提取了一组行:

Name <- c("John", "Jane", "Arthur", "Maggie")
Age <- c(20, 30, 31, 33)
City <- c("London", "Paris", "New York", "Delhi")

main_df <- data.frame(Name, Age, City)

sub_df <- main_df %>% 
  filter(grepl("J", Name))

main_df <- main_df %>% 
  filter(!grepl("J", Name))

Note that I am extracting some rows into a new dataframe, then deleting the extracted rows from the main dataframe. 请注意,我将一些行提取到一个新的数据帧中,然后从主数据帧中删除提取的行。

I am looking for a single line command to do this. 我正在寻找单行命令来执行此操作。 Help appreciated, especially if using dplyr . 帮助赞赏,特别是如果使用dplyr

We can write a function like 我们可以写一个像这样的函数

split_df <- function(df, char) {
  split(df, grepl(char, df$Name))
}

new_df <- split_df(main_df, "J")

new_df[[1]]
#    Name Age     City
#3 Arthur  31 New York
#4 Maggie  33    Delhi

new_df[[2]]
#  Name Age   City
#1 John  20 London
#2 Jane  30  Paris

In place of char make sure to pass appropriate character to split on. 代替char确保传递适当的字符以分开。 You can also use regex for char like ^J (starts with J) or J$ (ends with J) etc. 您还可以使用正则表达式char^J (开始为J)或J$为J结束)等。

For example, 例如,

new_df <- split_df(main_df, "^J")

would give same output as above. 会给出与上面相同的输出。

I think the following will allow you to extract rows based on multiple conditions from the original df and delete them from the original, using dplyr as requested. 我认为以下内容将允许您根据原始df多个条件提取行,并使用dplyr根据请求从原始df删除它们。

Name <- c("John", "Jane", "Arthur", "Maggie")
Age <- c(20, 30, 31, 33)
City <- c("London", "Paris", "New York", "Delhi")
main_df <- data.frame(Name, Age, City, stringsAsFactors = F)
conditions <- c(grepl("J",main_df$Name)) # works with several conditions as well
extractanddelete <- function(x, conditions) {
  condf <- data.frame(conditions)
  #fullcondition <- sapply(conditions, all)
  newdfs.list <- lapply(1:ncol(condf), function(i) x %>% filter(condf[,i]))
  newmain <<- x
  notcondf <- !condf
  sapply(1:ncol(condf), function(i) newmain <<- newmain %>% filter(notcondf[,i]))
  return(newdfs.list)
}
ndflist <- extractanddelete(main_df, conditions)
newmain
ndflist
> newmain
    Name Age     City
1 Arthur  31 New York
2 Maggie  33    Delhi
> ndflist
[[1]]
  Name Age   City
1 John  20 London
2 Jane  30  Paris

You receive a list containing as many elements as the conditions you use for filtering and deleting. 您会收到一个list其中包含与用于过滤和删除的条件一样多的元素。

For completeness, you can then do main_df <- newmain 为完整main_df <- newmain ,您可以执行main_df <- newmain

This solution also works for other conditions than just grepl 此解决方案也适用于除grepl之外的其他条件

I achieve it with mapply() function which apply function assign() to multiple list(vector) arguments. 我用mapply()函数实现它, mapply()函数assign()应用于多个列表(向量)参数。

Note: pos = 1 is necessary 注意: pos = 1是必要的

mapply(FUN = assign, x = c("main_df", "sub_df"),
                     value = split(main_df, grepl("J", main_df$Name)),
                     pos = 1)

main_df

#     Name Age     City
# 3 Arthur  31 New York
# 4 Maggie  33    Delhi

sub_df

#   Name Age   City
# 1 John  20 London
# 2 Jane  30  Paris

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 分割数据帧后访问数据帧 - Accessing dataframes after splitting a dataframe 在R中用“单独”(tidyr)拆分数据帧的几列 - Splitting several columns of a dataframe with 'separate' (tidyr) in R 根据每个数据帧中的一个因素将数据帧列表分成多个列表 - Splitting a list of dataframes into multiple lists based on a factor in each dataframe 优化:将数据帧拆分为数据帧列表,每行转换数据 - Optimization: splitting dataframe into a list of dataframes, transforming data per row 根据 R 中的列名将 dataframe 拆分为多个数据帧 - Splitting a dataframe into multiple dataframes based on the column name in R 根据拆分大数据框的列值制作新的数据框 - Make new dataframes based on splitting a big dataframe's column values 将 dataframe 拆分为具有相同列数的几个较小的数据框 - splitting a dataframe into several smaller datframes with an equal number of columns R to Python:将一列(在数据框中)分成数据框中的几列 - R to Python: separate a column (in dataframe) into several columns in dataframes 有没有办法用一个 for 循环计算几个数据帧的平均值,然后将其放入 dataframe 中? - Is there a way to calculate the mean of several dataframes with one for loop and then put it into a dataframe? R:将列表中的多个数据帧合并为单个数据帧,并区分数据帧 - R : Merge several dataframes from a list into a single dataframe with differentiation between dataframes
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM