简体   繁体   中英

Subtraction of pairs of columns based on corresponding variable names, using for loop (in R)

I have a wide-formatted data frame that contains variable names such as "per601.199003" (they all begin with "per" followed by 3-4 digit numbers, a full stop . and a number indicating a certain date).

Now for every pair of "per601..." and "per602..." variables I need to subtract the latter from the former: "per601..." - "per602..." .

There are endings that match (eg "per601.199003" and "per602.199003" ) but there are also other endings I have only a "per601..." - or a "per602..." -version of.

For reprodocibility but also for sake of simplicity, let's say these are my two lists of variable names (I obtained them using grep() ). In reality, both lists are obviously much longer.

vars_601 <- c("per601.199003", "per601.200201", "per601.2001409")
vars_602 <- c("per602.199003", "per602.200201", "per602.2001702")

Now what I would need is something like this:

for (i in per_601_list) {
  #search corresponding item in per_602_list (i.e. same ending)
  #subtract this latter item from the first item
}

I don't know what your per_60x_list s are supposed to be, so let me just use the character vectors of column names:

vars_601 <- c("per601.199003", "per601.200201", "per601.2001409")
vars_602 <- c("per602.199003", "per602.200201", "per602.2001702")

And I need some example data to work with, so I'll construct a dataframe named df with these colnames:

df <- as.data.frame(matrix(sample(1:100, 60, T), 10, 6))
names(df) <- c(vars_601, vars_602)

Now for your loop. We first check that there is a corresponding 602 column for each 601 column using grep , and if so, we subtract and assign a new variable both using df[paste()] :

for(i in seq_along(vars_601)) {
    # get the i'th 601 date
    thisdate <- substr(vars_601[i], 8, nchar(vars_601[i]))

    # check if there is a matching 602 date
    ismatch <- sum(grepl(paste0("*", thisdate), vars_602)) > 0

    # if there's a match, subtract: diff.date = 601.date - 602.date
    if(ismatch) {
        df[paste0("diff.", thisdate)] <- df[paste0("per601.", thisdate)] - 
                                         df[paste0("per602.", thisdate)]
    }
}

Alternatively and without looping, just get the matching 601 cols in one dataframe, the matching 602 cols in another dataframe, and (after making sure the cols are in the correct order) subtract the two dataframes:

var_601_dates <- substr(vars_601, 8, 14)
var_602_dates <- substr(vars_602, 8, 14)

df[ , sort(vars_601[var_601_dates %in% var_602_dates])] - 
df[ , sort(vars_602[var_602_dates %in% var_601_dates])]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM