简体   繁体   中英

R loop to create multiple new columns based on dataframe name

I'm currenting creating an R script to extract certain email attachments from my inbox, drop the attachments into dataframes (with the same name as the file name) and then parse the dataframe names into individual elements, which can then be used to create new columns within the dataframe. This will then be rbind-ed and finally be dropped into a SQL table.

I'm at the stage where I need a loop to loop over the dataframe names, parse them and add them as new columns, but I can't get my loop to work.

I have provided an example of my code below:

df_list <- Filter(function(x) is.data.frame(get(x)), ls())

for(i in df_list){
  i["Filename"]           <- df_list[i]
  i["Campaign_ID"]        <- sapply(strsplit(df_list[i], " "), "[", 1)
  i["Campaign_Name"]      <- str_sub(regmatches(df_list[i], regexpr("(?<=\\ )[^_]+", df_list[i], perl=TRUE)), start = 1, end = str_length(regmatches(df_list[i], regexpr("(?<=\\ )[^_]+", df_list[i], perl=TRUE))) - str_length(regmatches(df_list[i], regexpr("([A-Z]+[A-Z])[^_]+", df_list[i], perl=TRUE)))-1)
  i["Campaign_Code"]      <- regmatches(df_list[i], regexpr("([A-Z]+[A-Z])[^ -]+", df_list[i], perl=TRUE))
  i["Brand"]              <- substr(regmatches(df_list[i], regexpr("([A-Z]+[A-Z])[^ -]+", df_list[i], perl=TRUE)), start = 1, stop = 4)
  i["Campaign_Type"]      <- substr(regmatches(df_list[i], regexpr("([A-Z]+[A-Z])[^ -]+", df_list[i], perl=TRUE)), start = 5, stop = 7)
  i["Campaign_Category"]  <- substr(regmatches(df_list[i], regexpr("([A-Z]+[A-Z])[^ -]+", df_list[i], perl=TRUE)), start = 8, stop = 10)
  i["Campaign_Churn"]     <- substr(regmatches(df_list[i], regexpr("([A-Z]+[A-Z])[^ -]+", df_list[i], perl=TRUE)), start = 11, stop = 13)
  i["Product"]            <- substr(regmatches(df_list[i], regexpr("([A-Z]+[A-Z])[^ -]+", df_list[i], perl=TRUE)), start = 14, stop = 16)
  i["Version"]            <- substr(regmatches(df_list[i], regexpr("([A-Z]+[A-Z])[^ -]+", df_list[i], perl=TRUE)), start = 17, stop = 17)
  i["Segment"]            <- regmatches(df_list[i], regexpr("(?<=\\_)[^ -]+", df_list[i], perl=TRUE))
  i["Churn"]              <- regmatches(df_list[i], regexpr("(?<=\\- )[^ -]+", df_list[i], perl=TRUE))
  i["Stage"]              <- regmatches(df_list[i], regexpr("([S-S]+[a-z]+[a-z]+[a-z]+[a-z] )[^\\s]+", df_list[i], perl=TRUE))
  i["Other"]              <- str_sub(regmatches(df_list[i], regexpr("([S-S]+[a-z]+[a-z]+[a-z]+[a-z] )[^.]+", df_list[i], perl=TRUE)), start = str_length(regmatches(df_list[i], regexpr("([S-S]+[a-z]+[a-z]+[a-z]+[a-z] )[^\\s]+", df_list[i], perl=TRUE)))+2, end = str_length(regmatches(df_list[i], regexpr("([S-S]+[a-z]+[a-z]+[a-z]+[a-z] )[^.]+", df_list[i], perl=TRUE))) - str_length(regmatches(df_list[i], regexpr("\\S+(?=\\.[^.]*$)", df_list[i], perl=TRUE)))-1)
  i["Date"]               <- dmy(regmatches(df_list[i], regexpr("\\S+(?=\\.[^.]*$)", df_list[i], perl=TRUE)))
  print(i)
}

I imagine this is something simple that I am missing within my loop, but I can't seem to figure out what. I have tried this without the parsing, just adding random data, but it still doesn't function

For clarity, I have also provided the contents of 'df_list' (these are indeed dataframes - they are simply named the same as the file they were derived from for parsing purposes):

[1] "20579 Buzz Testing Nathan 1 BUZZRETJOUCHUALLA_D1A - Churned - Stage 1 Other 28-February-2019.csv"
[2] "20580 Buzz Testing Nathan 2 BUZZRETJOUCHUALLA_D1B - Churned - Stage 1 Other 28-February-2019.csv"
[3] "20581 Buzz Testing Nathan 3 BUZZRETJOUCHUALLA_D1C - Churned - Stage 1 Other 28-February-2019.csv"

Edit: I thought I'd add some more reproducable data, which should help clear things up a touch.

`20579 Buzz Testing Nathan 1 BUZZRETJOUCHUALLA_D1A - Churned - Stage 1 Other 28-February-2019.csv` <- data.frame(ID = 000000, Code = 'ABCDE')

`20580 Buzz Testing Nathan 2 BUZZRETJOUCHUALLA_D1B - Churned - Stage 1 Other 28-February-2019.csv` <- data.frame(ID = 111111, Code = 'FGHIJ')

`20581 Buzz Testing Nathan 3 BUZZRETJOUCHUALLA_D1C - Churned - Stage 1 Other 28-February-2019.csv` <- data.frame(ID = 222222, Code = 'KLMNO')

Then in each dataframe, create a new column, using elements of the dataframe name to populate them. So for the first dataframe for example, the first 5 digits of the dataframe name would be the campaignID for example. I already have string splitting for these elements, as referenced earlier in my question.

Since I don't have access to your data, I'm going to try to answer your question very generally with an example data frame and an arbitrary operation meant to represent a column modification process you might have specified in your for loop in the original post. The structure in my solution is a bit different. Instead of using a for loop I assemble the data frames into a list and use lapply to modify a named column.

df1 <- data.frame(foo = 1:5,
                  bar = c(7, NA, 22, 3, 14),
                  baz = c(T, F, F, NA, T))

df2 <- data.frame(foo = 1:5,
                  bar = c(4, NA, 9, 29, 11),
                  baz = c(T, T, F, NA, T))

df3 <- data.frame(foo = 1:5,
                  bar = c(1, 9, NA, 7, 12),
                  baz = c(F, F, F, NA, F))

dfs <- Filter(function(x) is.data.frame(get(x)), ls())

This next line will create a list whose entries are the data frames. The names could be changed with names(df_list) <- c( your names here )

df_list <- lapply(dfs, function(x) eval(as.name(x)))

Once again, since I don't have your original data, I'm applying an arbitrary transformation to the "bar" column of each data frame to illustrate how you might integrate your transformations into this general solution. Here I'm just adding 1 to each non-NA value in the "bar" column. Hopefully I'm not misinterpreting what you aim to accomplish. Post updates/comment if it isn't what you needed or if it doesn't work with your specific data.

df_list <- lapply(1:length(df_list), function(i) {
             reps = dim(df_list[[i]])[[1]]
             df_list[[i]][ ,"bar"] <- df_list[[i]][ ,"bar"] +
               rep(1, times = reps)
             df_list[[i]]
           })

The output should have been a list of data frames with 1 added to each non-NA element of the "bar". You could add transformations on other columns in the function being applied with lapply. If having your data frames in a list isn't going to work for you as a list, here's some code that will assign the transformed data frames in the list to the original data frames in the global environment:

assignment_fun <- function(x, y) {
  assign(x, y, envir = .GlobalEnv)
}

mapply(assignment_fun, dfs, df_list)
df1
df2
df3

You'll get a funny-looking output from the mapply line in the console summarizing the data types of the assignments, and if you call those data frames in the global environment they should now match the entries in the transformed data frame list.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM