R for-loop with assignment name vector and function

Question

Given a character vector, I would like to loop through a function with a name assignment.

uprop is a "data.frame" (1000 observations and 20 columns), as listed in output below:

> class(uprop)

[1] "data.frame"

And Department , Source , Target , and WeightCount are all column names in uprop

Let us say we need to simplify this repetitive task:

CAST_uprop_data <- subset(uprop, Department == "CAST", select = c(Source, Target, WeightCount))
CHEG_uprop_data <- subset(uprop, Department == "CHEG", select = c(Source, Target, WeightCount))
PHYS_uprop_data <- subset(uprop, Department == "PHYS", select = c(Source, Target, WeightCount))

Here CAST_uprop_data is also a data.frame. (100 observations and 3 columns) I can create a vector variable cust_dept_list with the character names:

cust_dept_list <- c('CAST', 'CHEG', 'PHYS')

but, I can not figure out how to loop through the names and have it run and assign each one?

Here is my attempt:

for (i in c(cust_dept_list)){
  print(paste0(i,"_uprop_data")) <- subset(uprop, Department == i, select = c(Source, Target, WeightCount)), i
}

Thanks in advance for helping a novice.

Answer 1

Don't create a bunch of different variables; create a list of values instead with

cust_dept_list <- c('CAST', 'CHEG', 'PHYS')
uprop_data <- lapply(cust_dept_list, function(x) 
    subset(uprop, Department == x, select = c(Source, Target, WeightCount))
)

and then you can access the data.frames with

uprop_data[["CAST"]]
uprop_data[["CHEG"]]
...

and it will be easier to loop functions over these data sets in a list for future analyses. See related responses at how do I make a list of data.frames

Answer 2

There are only rare cases in which you should be assigning global variables by looping through subsets. I would recommend learning the tidyverse.

If you don't understand anything below, please look it up because the %>% operator will save you a lot of time and effort (along with making code readable for others).

You will use a "tibble" which is very similar to a dataframe. Within this, you will simply group by the department and create an individual row with all of the data within it!

library(tidyverse)

unprop_data = data.frame(Department = c(rep("CAST",1000),rep("CHEG",1000),rep("PHYS",1000)),
                     Source = rnorm(3000),
                     Target = rnorm(3000),
                     WeightCount = rnorm(3000))

grouped_data = unprop_data %>%
  group_by(Department) %>%
  select(Source, Target, WeightCount) %>%
  nest()

The result follows:

> grouped_data
# A tibble: 3 x 2
  Department                 data
      <fctr>               <list>
1       CAST <tibble [1,000 x 3]>
2       CHEG <tibble [1,000 x 3]>
3       PHYS <tibble [1,000 x 3]>

If you needed to print all of these for some reason within a for loop (seems rough for 1000 lines per department) it would be as follows:

for(dept in unique(grouped_data$Department)){
  print(dept)
  print("###########################")
  print(
    grouped_data %>% 
      filter(Department == dept) %>%
      unnest()
  )
}

Which Returns:

[1] "CAST"
[1] "###########################"
# A tibble: 1,000 x 4
   Department     Source      Target WeightCount
       <fctr>      <dbl>       <dbl>       <dbl>
 1       CAST -0.3781853 -0.59457662   0.2796963
 2       CAST  0.7261541 -1.06344758   1.1874874
 3       CAST -0.1207312  0.56961950   0.2082236
 4       CAST -1.5467661  1.23693964  -0.9732976
 5       CAST -1.6626831  0.09252543  -0.3003913
 6       CAST -0.2783635 -0.84363946   2.0588511
 7       CAST  1.6981061  0.13755764  -0.3935691
 8       CAST  0.4900337 -0.73662209   0.8861508
 9       CAST  0.3971949 -0.23047428   1.6226582
10       CAST  0.7721574 -0.69117961  -0.4547899
# ... with 990 more rows
[1] "CHEG"
[1] "###########################"
# A tibble: 1,000 x 4
   Department     Source     Target WeightCount
       <fctr>      <dbl>      <dbl>       <dbl>
 1       CHEG -0.7843984 -0.8788216  0.60030359
 2       CHEG -0.5636669 -2.2283878 -0.16178492
 3       CHEG  0.9024084 -1.5052453 -1.58803972
 4       CHEG  1.7662237  1.2125255 -0.91229428
 5       CHEG  0.3950654 -0.8283651  0.07402481
 6       CHEG  0.3928973 -1.3650744 -0.75262682
 7       CHEG  1.1298127  1.4765888 -0.76059162
 8       CHEG  0.4787867  0.6041770 -1.23313321
 9       CHEG -1.4474401 -0.6747809  0.78431441
10       CHEG  0.6463868  0.2558378 -1.34131546
# ... with 990 more rows
[1] "PHYS"
[1] "###########################"
# A tibble: 1,000 x 4
   Department     Source      Target WeightCount
       <fctr>      <dbl>       <dbl>       <dbl>
 1       PHYS  0.1425978 -1.01397581 -0.16573546
 2       PHYS -1.2572684 -1.13069956 -0.61870063
 3       PHYS  1.2089882  1.51020970 -1.43474343
 4       PHYS -0.6357010 -0.07362852  0.06683348
 5       PHYS -1.6402587 -1.35273300  0.14436313
 6       PHYS -0.9408105 -1.52515527 -0.06860152
 7       PHYS  0.3143868  0.11814597 -0.37823801
 8       PHYS -0.3232879  0.15408677 -0.62820531
 9       PHYS  0.3152122 -0.72634466 -1.71955337
10       PHYS  0.7268282 -0.20872075  0.30780981
# ... with 990 more rows

R for-loop with assignment name vector and function

Question

2 answers

solution1
3 ACCPTED 2017-08-03 20:16:11

solution2
1 2017-08-03 21:02:22

R for-loop with assignment name vector and function

Question

2 answers

solution1 3 ACCPTED 2017-08-03 20:16:11

solution2 1 2017-08-03 21:02:22

solution1
3 ACCPTED 2017-08-03 20:16:11

solution2
1 2017-08-03 21:02:22