简体   繁体   English

动态拆分和命名数据帧

[英]Dynamically Splitting and naming the dataframe

I have a data frame called "loan", this contains all information like customer ID, loan amount, term of loan etc.我有一个名为“贷款”的数据框,它包含所有信息,如客户 ID、贷款金额、贷款期限等。

There is a column called "yq" which is basically the year and quarter of the loan disbursement date有一个名为“yq”的列,基本上是贷款发放日期的年份和季度

    ID      yq
    1       2014 Q4
    2       2014 Q4
    3       2015 Q1
    4       2015 Q2
    5       2015 Q3  

I wanted to split the data frame based on the quarters and year, so naturally I used the following:我想根据季度和年份拆分数据框,所以很自然地我使用了以下内容:

 list_of_dataframes <- split(loan,                  
                       with(loan, yq), 
                       drop = TRUE)

But this leads to me hard coding the split datasets like so:但这导致我对拆分数据集进行硬编码,如下所示:

      loan_2014_q4 <- list_of_dataframes[[1]]

      loan_2015_q1 <- list_of_dataframes[[2]]

Is there a way to do this better, like where the name of the dataframe is "loan" followed by the value in the row for each.有没有办法做得更好,比如数据帧的名称是“loan”,然后是每个行中的值。 Also the number of dataframes to be saved is dynamic要保存的数据帧的数量也是动态的

Basically I am trying to automate the process, thus the number of split and data frame are automatically split, named and saved.基本上我试图自动化这个过程,因此拆分和数据帧的数量会自动拆分、命名和保存。

Thanks in advance提前致谢

Since you asked to automate, run the following after creating list_of_dataframes由于您要求自动化,请在创建list_of_dataframes后运行以下命令

temp <- unique(loan$yq)
for(i in 1:length(list_of_dataframes)) {
    assign(paste0("loan_",temp[i]),list_of_dataframes[[i]])
}

I would ultimately recommend you keep your data in a list (or you don't even need lists if you are using tools like "data.table" and "dplyr" which give you extremely flexible subsetting options).我最终建议您将数据保存在list (或者,如果您使用诸如“data.table”和“dplyr”之类的工具为您提供极其灵活的子集设置选项,则您甚至不需要列表)。

However, if you really feel you need separate data.frame s, try the following:但是,如果您真的觉得需要单独的data.frame ,请尝试以下操作:

## Assume your data.frame is called "mydf"....
temp <- split(mydf, mydf$yq, drop = TRUE)
ls()
[1] "mydf" "temp"
temp
# $`2014 Q4`
#   ID      yq
# 1  1 2014 Q4
# 2  2 2014 Q4
# 
# $`2015 Q1`
#   ID      yq
# 3  3 2015 Q1
# 
# $`2015 Q2`
#   ID      yq
# 4  4 2015 Q2
# 
# $`2015 Q3`
#   ID      yq
# 5  5 2015 Q3

Now, use list2env to put each list item into the global environment as its own object.现在,使用list2env将每个列表项作为它自己的对象放入全局环境中。 Modify the list's names first.首先修改列表的名称。

list2env(setNames(temp, sprintf("loan_%s", gsub("\\s+", "_", names(temp)))), .GlobalEnv)
# <environment: R_GlobalEnv>
ls()
# [1] "loan_2014_Q4" "loan_2015_Q1" "loan_2015_Q2" "loan_2015_Q3" "mydf" "temp"
loan_2014_Q4
#   ID      yq
# 1  1 2014 Q4
# 2  2 2014 Q4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM