简体   繁体   English

R循环根据数据框名称创建多个新列

[英]R loop to create multiple new columns based on dataframe name

I'm currenting creating an R script to extract certain email attachments from my inbox, drop the attachments into dataframes (with the same name as the file name) and then parse the dataframe names into individual elements, which can then be used to create new columns within the dataframe. 我目前正在创建一个R脚本,以从收件箱中提取某些电子邮件附件,将附件放入数据框(与文件名同名)中,然后将数据框名称解析为单个元素,然后可以使用这些元素创建新的数据框中的列。 This will then be rbind-ed and finally be dropped into a SQL table. 然后将对它进行rbind处理,最后将其放入SQL表中。

I'm at the stage where I need a loop to loop over the dataframe names, parse them and add them as new columns, but I can't get my loop to work. 我处于需要循环以遍历数据框名称,解析它们并将其添加为新列的阶段,但是我无法使循环起作用。

I have provided an example of my code below: 我在下面提供了我的代码示例:

df_list <- Filter(function(x) is.data.frame(get(x)), ls())

for(i in df_list){
  i["Filename"]           <- df_list[i]
  i["Campaign_ID"]        <- sapply(strsplit(df_list[i], " "), "[", 1)
  i["Campaign_Name"]      <- str_sub(regmatches(df_list[i], regexpr("(?<=\\ )[^_]+", df_list[i], perl=TRUE)), start = 1, end = str_length(regmatches(df_list[i], regexpr("(?<=\\ )[^_]+", df_list[i], perl=TRUE))) - str_length(regmatches(df_list[i], regexpr("([A-Z]+[A-Z])[^_]+", df_list[i], perl=TRUE)))-1)
  i["Campaign_Code"]      <- regmatches(df_list[i], regexpr("([A-Z]+[A-Z])[^ -]+", df_list[i], perl=TRUE))
  i["Brand"]              <- substr(regmatches(df_list[i], regexpr("([A-Z]+[A-Z])[^ -]+", df_list[i], perl=TRUE)), start = 1, stop = 4)
  i["Campaign_Type"]      <- substr(regmatches(df_list[i], regexpr("([A-Z]+[A-Z])[^ -]+", df_list[i], perl=TRUE)), start = 5, stop = 7)
  i["Campaign_Category"]  <- substr(regmatches(df_list[i], regexpr("([A-Z]+[A-Z])[^ -]+", df_list[i], perl=TRUE)), start = 8, stop = 10)
  i["Campaign_Churn"]     <- substr(regmatches(df_list[i], regexpr("([A-Z]+[A-Z])[^ -]+", df_list[i], perl=TRUE)), start = 11, stop = 13)
  i["Product"]            <- substr(regmatches(df_list[i], regexpr("([A-Z]+[A-Z])[^ -]+", df_list[i], perl=TRUE)), start = 14, stop = 16)
  i["Version"]            <- substr(regmatches(df_list[i], regexpr("([A-Z]+[A-Z])[^ -]+", df_list[i], perl=TRUE)), start = 17, stop = 17)
  i["Segment"]            <- regmatches(df_list[i], regexpr("(?<=\\_)[^ -]+", df_list[i], perl=TRUE))
  i["Churn"]              <- regmatches(df_list[i], regexpr("(?<=\\- )[^ -]+", df_list[i], perl=TRUE))
  i["Stage"]              <- regmatches(df_list[i], regexpr("([S-S]+[a-z]+[a-z]+[a-z]+[a-z] )[^\\s]+", df_list[i], perl=TRUE))
  i["Other"]              <- str_sub(regmatches(df_list[i], regexpr("([S-S]+[a-z]+[a-z]+[a-z]+[a-z] )[^.]+", df_list[i], perl=TRUE)), start = str_length(regmatches(df_list[i], regexpr("([S-S]+[a-z]+[a-z]+[a-z]+[a-z] )[^\\s]+", df_list[i], perl=TRUE)))+2, end = str_length(regmatches(df_list[i], regexpr("([S-S]+[a-z]+[a-z]+[a-z]+[a-z] )[^.]+", df_list[i], perl=TRUE))) - str_length(regmatches(df_list[i], regexpr("\\S+(?=\\.[^.]*$)", df_list[i], perl=TRUE)))-1)
  i["Date"]               <- dmy(regmatches(df_list[i], regexpr("\\S+(?=\\.[^.]*$)", df_list[i], perl=TRUE)))
  print(i)
}

I imagine this is something simple that I am missing within my loop, but I can't seem to figure out what. 我想这是我在循环中遗漏的简单东西,但似乎无法弄清楚。 I have tried this without the parsing, just adding random data, but it still doesn't function 我已经尝试过没有解析,只是添加了随机数据,但仍然无法正常工作

For clarity, I have also provided the contents of 'df_list' (these are indeed dataframes - they are simply named the same as the file they were derived from for parsing purposes): 为了清楚起见,我还提供了'df_list'的内容(这些确实是数据帧-它们的名称与解析时所使用的文件的名称相同):

[1] "20579 Buzz Testing Nathan 1 BUZZRETJOUCHUALLA_D1A - Churned - Stage 1 Other 28-February-2019.csv"
[2] "20580 Buzz Testing Nathan 2 BUZZRETJOUCHUALLA_D1B - Churned - Stage 1 Other 28-February-2019.csv"
[3] "20581 Buzz Testing Nathan 3 BUZZRETJOUCHUALLA_D1C - Churned - Stage 1 Other 28-February-2019.csv"

Edit: I thought I'd add some more reproducable data, which should help clear things up a touch. 编辑:我以为我会添加一些可重现的数据,这应该有助于清理事物。

`20579 Buzz Testing Nathan 1 BUZZRETJOUCHUALLA_D1A - Churned - Stage 1 Other 28-February-2019.csv` <- data.frame(ID = 000000, Code = 'ABCDE')

`20580 Buzz Testing Nathan 2 BUZZRETJOUCHUALLA_D1B - Churned - Stage 1 Other 28-February-2019.csv` <- data.frame(ID = 111111, Code = 'FGHIJ')

`20581 Buzz Testing Nathan 3 BUZZRETJOUCHUALLA_D1C - Churned - Stage 1 Other 28-February-2019.csv` <- data.frame(ID = 222222, Code = 'KLMNO')

Then in each dataframe, create a new column, using elements of the dataframe name to populate them. 然后,在每个数据框中创建一个新列,并使用数据框中名称的元素填充它们。 So for the first dataframe for example, the first 5 digits of the dataframe name would be the campaignID for example. 因此,例如对于第一个数据框,数据框名称的前5位数字将是campaignID。 I already have string splitting for these elements, as referenced earlier in my question. 正如我的问题前面提到的,我已经为这些元素进行了字符串拆分。

Since I don't have access to your data, I'm going to try to answer your question very generally with an example data frame and an arbitrary operation meant to represent a column modification process you might have specified in your for loop in the original post. 由于我无权访问您的数据,因此我将尝试通过示例数据框和任意操作来回答您的问题,该示例数据框和任意操作代表您可能在原始的for循环中指定的列修改过程帖子。 The structure in my solution is a bit different. 我的解决方案中的结构有些不同。 Instead of using a for loop I assemble the data frames into a list and use lapply to modify a named column. 我没有使用for循环,而是将数据帧组装到列表中,并使用lapply修改了命名列。

df1 <- data.frame(foo = 1:5,
                  bar = c(7, NA, 22, 3, 14),
                  baz = c(T, F, F, NA, T))

df2 <- data.frame(foo = 1:5,
                  bar = c(4, NA, 9, 29, 11),
                  baz = c(T, T, F, NA, T))

df3 <- data.frame(foo = 1:5,
                  bar = c(1, 9, NA, 7, 12),
                  baz = c(F, F, F, NA, F))

dfs <- Filter(function(x) is.data.frame(get(x)), ls())

This next line will create a list whose entries are the data frames. 下一行将创建一个列表,其条目为数据帧。 The names could be changed with names(df_list) <- c( your names here ) 名称可以使用names(df_list)<-c( 此处为您的名字 )进行更改

df_list <- lapply(dfs, function(x) eval(as.name(x)))

Once again, since I don't have your original data, I'm applying an arbitrary transformation to the "bar" column of each data frame to illustrate how you might integrate your transformations into this general solution. 再一次,由于我没有您的原始数据,因此我将任意转换应用于每个数据框的“栏”列,以说明如何将转换集成到此通用解决方案中。 Here I'm just adding 1 to each non-NA value in the "bar" column. 在这里,我只是将“ bar”列中的每个非NA值加1。 Hopefully I'm not misinterpreting what you aim to accomplish. 希望我不会误解您要实现的目标。 Post updates/comment if it isn't what you needed or if it doesn't work with your specific data. 如果不是您需要的更新或注释,或者它不适用于您的特定数据,请发布更新/注释。

df_list <- lapply(1:length(df_list), function(i) {
             reps = dim(df_list[[i]])[[1]]
             df_list[[i]][ ,"bar"] <- df_list[[i]][ ,"bar"] +
               rep(1, times = reps)
             df_list[[i]]
           })

The output should have been a list of data frames with 1 added to each non-NA element of the "bar". 输出应该是一个数据帧列表,其中“ bar”的每个非NA元素都添加了1。 You could add transformations on other columns in the function being applied with lapply. 您可以在通过lapply应用的函数中的其他列上添加转换。 If having your data frames in a list isn't going to work for you as a list, here's some code that will assign the transformed data frames in the list to the original data frames in the global environment: 如果将数据框放在列表中对您来说不起作用,那么以下代码将在列表中将转换后的数据框分配给全局环境中的原始数据框:

assignment_fun <- function(x, y) {
  assign(x, y, envir = .GlobalEnv)
}

mapply(assignment_fun, dfs, df_list)
df1
df2
df3

You'll get a funny-looking output from the mapply line in the console summarizing the data types of the assignments, and if you call those data frames in the global environment they should now match the entries in the transformed data frame list. 您将在控制台的mapply行中看到一个有趣的输出,概述了分配的数据类型,如果在全局环境中调用这些数据框,它们现在应该与转换后的数据框列表中的条目匹配。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 基于R中的单个字符列创建具有多列的新数据框 - Create new dataframe with multiple columns based on single character column in R 基于另一个 dataframe 中的行创建多个新数据帧,并在 r 中使用 for 循环 - Create multiple new dataframes based on rows in another dataframe with a for loop in r 根据数据框中的多个列向r数据框中添加新列 - Adding new columns to r dataframe based on multiple columns within the dataframe 如何从多个文本文件创建组合 dataframe 并根据 R 中的文件名重命名列 - How to create a combined dataframe from multiple text files and rename columns based on the file name in R 使用循环从 R 中的 dataframe 中的另一列创建多个列 - Use loop for create multiple columns from another columns in dataframe in R 如何使用R中的for循环基于列创建数据框的子集 - How to create subsets of a dataframe based on columns using a for loop in R 如何在循环内在R中的数据框中创建新列并向其中添加新列? - How to create and add new columns to a dataframe in R within a loop? 根据其他列中的条件创建新的R数据框列 - Create new R dataframe column based on conditions in other columns 根据其他3列的结果在R数据框中创建新列 - Create new column in R dataframe based on results from 3 other columns 根据R中的条件向数据帧添加多个新列 - Add multiple new columns to dataframe based on condition in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM