简体   繁体   English

r函数或循环以创建新列并根据限制计算值

[英]r function or loop to create new columns and calculate values based upon limits

I currently use 40 lines of code to create and calculate new columns if certain conditions are met. 如果满足某些条件,我目前使用40行代码来创建和计算新列。 I am trying to come up w/ a way to turn all of this code into either a loop or function to simplify my script. 我试图提出一种将所有这些代码转换成循环或函数的方法,以简化我的脚本。

Here is some sample data: 以下是一些示例数据:

set.seed(1)
dat <- data.frame(sc1 = sample(LETTERS[1:6],15,replace=T),
                  sc1_n = sample (1:100,15),
                  sc2 = sample(LETTERS[1:6],15,replace=T),
                  sc2_n = sample (1:100,15),
                  sc3 = sample(LETTERS[1:6],15,replace=T),
                  sc3_n = sample (1:100,15),
                  ec1 = sample(LETTERS[1:6],15,replace=T),
                  ec1_n = sample (1:100,15),
                  ec2 = sample(LETTERS[1:6],15,replace=T),
                  ec2_n = sample (1:100,15),
                  ec3 = sample(LETTERS[1:6],15,replace=T),
                  ec3_n = sample (1:100,15),
                  area = sample (1:100,15))

I iterate through each unique value of sc1 (AF, n=6), sc2 (AF, n=6), and sc3 (AF, n=6) to calculate a value, then add the unique values together to create another column, called A, B, C, D, E, or F, with 's' appended after to indicate it was a value for s, and not e, which I also iterate through after I finish with sc1, sc2, and sc3. 我遍历sc1(AF,n = 6),sc2(AF,n = 6)和sc3(AF,n = 6)的每个唯一值以计算一个值,然后将这些唯一值加在一起以创建另一列,称为A,B,C,D,E或F,并在其后附加“ s”以表示它是s的值,而不是e的值,在我用sc1,sc2和sc3完成之后我也要对其进行迭代。

Here are the 40 lines of code I currently use to generate the columns and values I need: 这是我当前用于生成所需列和值的40行代码:

dat <- transform(dat,A1s = (sc1_n * 0.01) * (area) * (sc1 == "A")) #create new column A1s, and calculates a number if sc1=='A'
dat <- transform(dat,A2s = (sc2_n * 0.01) * (area) * (sc2 == "A")) #create new column A2s, and calculates a number if sc2=='A'
dat <- transform(dat,A3s = (sc3_n * 0.01) * (area) * (sc3 == "A")) #same as above, except A3s and where sc3='A'
dat <- transform(dat,As = A1s + A2s + A3s) #I really don't need A1s, A2s, or A3s, except to calculate this column, As
dat <- transform(dat,B1s = (sc1_n * 0.01) * (area) * (sc1 == "B"))
dat <- transform(dat,B2s = (sc2_n * 0.01) * (area) * (sc2 == "B"))
dat <- transform(dat,B3s = (sc3_n * 0.01) * (area) * (sc3 == "B"))
dat <- transform(dat,Bs = B1s + B2s + B3s)
dat <- transform(dat,C1s = (sc1_n * 0.01) * (area) * (sc1 == "C"))
dat <- transform(dat,C2s = (sc2_n * 0.01) * (area) * (sc2 == "C"))
dat <- transform(dat,C3s = (sc3_n * 0.01) * (area) * (sc3 == "C"))
dat <- transform(dat,Cs = C1s + C2s + C3s)
dat <- transform(dat,D1s = (sc1_n * 0.01) * (area) * (sc1 == "D"))
dat <- transform(dat,D2s = (sc2_n * 0.01) * (area) * (sc2 == "D"))
dat <- transform(dat,D3s = (sc3_n * 0.01) * (area) * (sc3 == "D"))
dat <- transform(dat,Ds = D1s + D2s + D3s)
dat <- transform(dat,E1s = (sc1_n * 0.01) * (area) * (sc1 == "E"))
dat <- transform(dat,E2s = (sc2_n * 0.01) * (area) * (sc2 == "E"))
dat <- transform(dat,E3s = (sc3_n * 0.01) * (area) * (sc3 == "E"))
dat <- transform(dat,Es = E1s + E2s + E3s)
dat <- transform(dat,F1s = (sc1_n * 0.01) * (area) * (sc1 == "F"))
dat <- transform(dat,F2s = (sc2_n * 0.01) * (area) * (sc2 == "F"))
dat <- transform(dat,F3s = (sc3_n * 0.01) * (area) * (sc3 == "F"))
dat <- transform(dat,Fs = F1s + F2s + F3s)

dat <- transform(dat,A1e = (ec1_n * 0.01) * (area) * (ec1 == "A"))
dat <- transform(dat,A2e = (ec2_n * 0.01) * (area) * (ec2 == "A"))
dat <- transform(dat,A3e = (ec3_n * 0.01) * (area) * (ec3 == "A"))
dat <- transform(dat,Ae = A1e + A2e + A3e)
dat <- transform(dat,B1e = (ec1_n * 0.01) * (area) * (ec1 == "B"))
dat <- transform(dat,B2e = (ec2_n * 0.01) * (area) * (ec2 == "B"))
dat <- transform(dat,B3e = (ec3_n * 0.01) * (area) * (ec3 == "B"))
dat <- transform(dat,Be = B1e + B2e + B3e)
dat <- transform(dat,C1e = (ec1_n * 0.01) * (area) * (ec1 == "C"))
dat <- transform(dat,C2e = (ec2_n * 0.01) * (area) * (ec2 == "C"))
dat <- transform(dat,C3e = (ec3_n * 0.01) * (area) * (ec3 == "C"))
dat <- transform(dat,Ce = C1e + C2e + C3e)
dat <- transform(dat,D1e = (ec1_n * 0.01) * (area) * (ec1 == "D"))
dat <- transform(dat,D2e = (ec2_n * 0.01) * (area) * (ec2 == "D"))
dat <- transform(dat,D3e = (ec3_n * 0.01) * (area) * (ec3 == "D"))
dat <- transform(dat,De = D1e + D2e + D3e)
dat <- transform(dat,E1e = (ec1_n * 0.01) * (area) * (ec1 == "E"))
dat <- transform(dat,E2e = (ec2_n * 0.01) * (area) * (ec2 == "E"))
dat <- transform(dat,E3e = (ec3_n * 0.01) * (area) * (ec3 == "E"))
dat <- transform(dat,Ee = E1e + E2e + E3e)
dat <- transform(dat,F1e = (ec1_n * 0.01) * (area) * (ec1 == "F"))
dat <- transform(dat,F2e = (ec2_n * 0.01) * (area) * (ec2 == "F"))
dat <- transform(dat,F3e = (ec3_n * 0.01) * (area) * (ec3 == "F"))
dat <- transform(dat,Fe = F1e + F2e + F3e)

I'm sure there must be a way to smartly and efficiently do this via creating lists and loops or at least a function, but I've been looking and haven't found a way. 我确信必须有一种方法,可以通过创建列表和循环或至少一个函数来巧妙而有效地做到这一点,但是我一直在寻找并且没有找到一种方法。

-al -al

How about a transformation like this 这样的转变怎么样

for(p in c("s","e")) {
   g <- dat[, paste0(p, "c",1:3)]
   n <- dat[, paste0(p, "c",1:3,"_n")]
   for(x in LETTERS[1:5]) {
       dat[, paste0(x,p) ] <- rowSums(n * 0.01 * (g==x) * dat$area)
   }
}

Here we loop over the different sets for the "s" and "e" prefix, and we extract the subset of columns related to that prefix. 在这里,我们遍历“ s”和“ e”前缀的不同集合,并提取与该前缀相关的列的子集。 Next we loop over all the groups and calculate the sum of the rows for that group. 接下来,我们遍历所有组并计算该组的行总和。 Here we are trying to take advantage of as much of the information stored in the column name as possible. 在这里,我们试图利用存储在列名称中的尽可能多的信息。 This will not create the temporary columns you don't need (A1s, A2s, etc) 这不会创建您不需要的临时列(A1,A2等)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM