简体   繁体   中英

r function or loop to create new columns and calculate values based upon limits

I currently use 40 lines of code to create and calculate new columns if certain conditions are met. I am trying to come up w/ a way to turn all of this code into either a loop or function to simplify my script.

Here is some sample data:

set.seed(1)
dat <- data.frame(sc1 = sample(LETTERS[1:6],15,replace=T),
                  sc1_n = sample (1:100,15),
                  sc2 = sample(LETTERS[1:6],15,replace=T),
                  sc2_n = sample (1:100,15),
                  sc3 = sample(LETTERS[1:6],15,replace=T),
                  sc3_n = sample (1:100,15),
                  ec1 = sample(LETTERS[1:6],15,replace=T),
                  ec1_n = sample (1:100,15),
                  ec2 = sample(LETTERS[1:6],15,replace=T),
                  ec2_n = sample (1:100,15),
                  ec3 = sample(LETTERS[1:6],15,replace=T),
                  ec3_n = sample (1:100,15),
                  area = sample (1:100,15))

I iterate through each unique value of sc1 (AF, n=6), sc2 (AF, n=6), and sc3 (AF, n=6) to calculate a value, then add the unique values together to create another column, called A, B, C, D, E, or F, with 's' appended after to indicate it was a value for s, and not e, which I also iterate through after I finish with sc1, sc2, and sc3.

Here are the 40 lines of code I currently use to generate the columns and values I need:

dat <- transform(dat,A1s = (sc1_n * 0.01) * (area) * (sc1 == "A")) #create new column A1s, and calculates a number if sc1=='A'
dat <- transform(dat,A2s = (sc2_n * 0.01) * (area) * (sc2 == "A")) #create new column A2s, and calculates a number if sc2=='A'
dat <- transform(dat,A3s = (sc3_n * 0.01) * (area) * (sc3 == "A")) #same as above, except A3s and where sc3='A'
dat <- transform(dat,As = A1s + A2s + A3s) #I really don't need A1s, A2s, or A3s, except to calculate this column, As
dat <- transform(dat,B1s = (sc1_n * 0.01) * (area) * (sc1 == "B"))
dat <- transform(dat,B2s = (sc2_n * 0.01) * (area) * (sc2 == "B"))
dat <- transform(dat,B3s = (sc3_n * 0.01) * (area) * (sc3 == "B"))
dat <- transform(dat,Bs = B1s + B2s + B3s)
dat <- transform(dat,C1s = (sc1_n * 0.01) * (area) * (sc1 == "C"))
dat <- transform(dat,C2s = (sc2_n * 0.01) * (area) * (sc2 == "C"))
dat <- transform(dat,C3s = (sc3_n * 0.01) * (area) * (sc3 == "C"))
dat <- transform(dat,Cs = C1s + C2s + C3s)
dat <- transform(dat,D1s = (sc1_n * 0.01) * (area) * (sc1 == "D"))
dat <- transform(dat,D2s = (sc2_n * 0.01) * (area) * (sc2 == "D"))
dat <- transform(dat,D3s = (sc3_n * 0.01) * (area) * (sc3 == "D"))
dat <- transform(dat,Ds = D1s + D2s + D3s)
dat <- transform(dat,E1s = (sc1_n * 0.01) * (area) * (sc1 == "E"))
dat <- transform(dat,E2s = (sc2_n * 0.01) * (area) * (sc2 == "E"))
dat <- transform(dat,E3s = (sc3_n * 0.01) * (area) * (sc3 == "E"))
dat <- transform(dat,Es = E1s + E2s + E3s)
dat <- transform(dat,F1s = (sc1_n * 0.01) * (area) * (sc1 == "F"))
dat <- transform(dat,F2s = (sc2_n * 0.01) * (area) * (sc2 == "F"))
dat <- transform(dat,F3s = (sc3_n * 0.01) * (area) * (sc3 == "F"))
dat <- transform(dat,Fs = F1s + F2s + F3s)

dat <- transform(dat,A1e = (ec1_n * 0.01) * (area) * (ec1 == "A"))
dat <- transform(dat,A2e = (ec2_n * 0.01) * (area) * (ec2 == "A"))
dat <- transform(dat,A3e = (ec3_n * 0.01) * (area) * (ec3 == "A"))
dat <- transform(dat,Ae = A1e + A2e + A3e)
dat <- transform(dat,B1e = (ec1_n * 0.01) * (area) * (ec1 == "B"))
dat <- transform(dat,B2e = (ec2_n * 0.01) * (area) * (ec2 == "B"))
dat <- transform(dat,B3e = (ec3_n * 0.01) * (area) * (ec3 == "B"))
dat <- transform(dat,Be = B1e + B2e + B3e)
dat <- transform(dat,C1e = (ec1_n * 0.01) * (area) * (ec1 == "C"))
dat <- transform(dat,C2e = (ec2_n * 0.01) * (area) * (ec2 == "C"))
dat <- transform(dat,C3e = (ec3_n * 0.01) * (area) * (ec3 == "C"))
dat <- transform(dat,Ce = C1e + C2e + C3e)
dat <- transform(dat,D1e = (ec1_n * 0.01) * (area) * (ec1 == "D"))
dat <- transform(dat,D2e = (ec2_n * 0.01) * (area) * (ec2 == "D"))
dat <- transform(dat,D3e = (ec3_n * 0.01) * (area) * (ec3 == "D"))
dat <- transform(dat,De = D1e + D2e + D3e)
dat <- transform(dat,E1e = (ec1_n * 0.01) * (area) * (ec1 == "E"))
dat <- transform(dat,E2e = (ec2_n * 0.01) * (area) * (ec2 == "E"))
dat <- transform(dat,E3e = (ec3_n * 0.01) * (area) * (ec3 == "E"))
dat <- transform(dat,Ee = E1e + E2e + E3e)
dat <- transform(dat,F1e = (ec1_n * 0.01) * (area) * (ec1 == "F"))
dat <- transform(dat,F2e = (ec2_n * 0.01) * (area) * (ec2 == "F"))
dat <- transform(dat,F3e = (ec3_n * 0.01) * (area) * (ec3 == "F"))
dat <- transform(dat,Fe = F1e + F2e + F3e)

I'm sure there must be a way to smartly and efficiently do this via creating lists and loops or at least a function, but I've been looking and haven't found a way.

-al

How about a transformation like this

for(p in c("s","e")) {
   g <- dat[, paste0(p, "c",1:3)]
   n <- dat[, paste0(p, "c",1:3,"_n")]
   for(x in LETTERS[1:5]) {
       dat[, paste0(x,p) ] <- rowSums(n * 0.01 * (g==x) * dat$area)
   }
}

Here we loop over the different sets for the "s" and "e" prefix, and we extract the subset of columns related to that prefix. Next we loop over all the groups and calculate the sum of the rows for that group. Here we are trying to take advantage of as much of the information stored in the column name as possible. This will not create the temporary columns you don't need (A1s, A2s, etc)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM