简体   繁体   中英

Iterating through data frame and changing values on condition [R]

Had to make an account because this sequence of for loops has been annoying me for quite some time.

I have a data frame in R with 1000 rows and 10 columns, with each value ranging from 1:3. I would like to re-code EVERY entry so that: 1==3, 2==2, 3==1. I understand that there are easier ways to do this, such as sub-setting each column and hard coding the condition, but this isn't always ideal as many of the data sets that I work with have up to 100 columns.

I would like to use a nested loop in order to accomplish this task -- this is what I have thus far:

for(i in 1:nrow(dat_trans)){
  for(j in length(dat_trans)){
    if(dat_trans[i,j] == 1){
      dat_trans[i,j] <- 3
    } else if(dat_trans[i,j] == 2){
      dat_trans[i,j] <- 2
    } else{
      dat_trans[i,j] <- 1
    }
  }
}

So I iterate through the first column, grab every value and change it based on the if/else's condition, I am still learning R so if you have any pointers in my code, feel free to point it out.

edit: code

R is a vectorized language, so you really don't need the inner loop.
Also if you notice that 4-"old value" = "new value", you can eliminate the if statements.

for(i in 1:ncol(dat_trans)){
        dat_trans[,i] <- 4-dat_trans[,i]
}

The outer loop is now iterating across the columns for only 10 iterations as opposed to 1000 for all of rows. This will greatly improve performance.

This type of operation is a swap operation. The ways to swap values without for loops are numerous.

To set up a simple dataframe:

df <- data.frame(
  col1 = c(1,2,3),
  col2 = c(2,3,1),
  col3 = c(3,1,2)
)

Using a dummy value:

df[df==1] <- 4
df[df==3] <- 1
df[df==4] <- 3

Using a temporary variable:

dftemp <- df
df[dftemp==1] <- 3
df[dftemp==3] <- 1

Using multiplication/division and addition/subtraction:

df <- 4 - df

Using Boolean operations:

df <- (df==1) * 3 + (df==2) * 2 + (df==3) * 1

Using a bitwise xor (in case you really have a need for speed):

df[df!=2] <- sapply(df, function(x){bitwXor(2,x)})[df!=2]

If a nested for loop is required the switch function is a good option.

for(i in seq(ncol(df))){
  for(j in seq(nrow(df))){
    df[j,i] <- switch(df[j,i],3,2,1)
  }
}

Text can be used if the values are not as nicely indexed as 1, 2, and 3.

for(i in seq(ncol(df))){
  for(j in seq(nrow(df))){
    df[j,i] <- switch(as.character(df[j,i]),
                      "1" = 3,
                      "2" = 2,
                      "3" = 1)
  }
}

This sounds like a merge / join operation.

set.seed(42)
dat_trans <- as.data.frame(
  setNames(lapply(1:3, function(ign) sample(1:3, size=10, replace=TRUE)),
           c("V1", "V2", "V3"))
)
dat_trans
#    V1 V2 V3
# 1   3  2  3
# 2   3  3  1
# 3   1  3  3
# 4   3  1  3
# 5   2  2  1
# 6   2  3  2
# 7   3  3  2
# 8   1  1  3
# 9   2  2  2
# 10  3  2  3

newvals <- data.frame(old = c(1, 3), new = c(3, 1))
newvals
#   old new
# 1   1   3
# 2   3   1

Using dplyr and tidyr :

library(dplyr)
library(tidyr) # gather, spread
dat_trans %>%
  mutate(rn = row_number()) %>%
  gather(k, v, -rn) %>%
  left_join(newvals, by = c("v" = "old")) %>%
  mutate(v = if_else(is.na(new), v, new)) %>%
  select(-new) %>%
  spread(k, v) %>%
  select(-rn)
#    V1 V2 V3
# 1   1  2  1
# 2   1  1  3
# 3   3  1  1
# 4   1  3  1
# 5   2  2  3
# 6   2  1  2
# 7   1  1  2
# 8   3  3  1
# 9   2  2  2
# 10  1  2  1

(The need for rn is likely due to my use of an older version of tidyr : I'm at 0.8.2, though 1.0.0 has recently been released. That release did a lot of enhancement/work on spread / gather and introduced the pivot_* functions which are likely much smoother at this. If you have a more recent version, try this without the rn portions.)


Or a much-more-direct approach using a "recode" mindset:

dat_trans[,c("V1", "V2", "V3")] <- lapply(dat_trans[,c("V1", "V2", "V3")], car::recode, "1=3; 3=1")
# or
dat_trans[,c("V1", "V2", "V3")] <- lapply(dat_trans[,c("V1", "V2", "V3")], dplyr::recode, '1' = 3L, '3' = 1L)

You could use an assignment matrix am . match() each value of an attribute of df1 with column 1 of am but select column 2, then assign it to df1 . In a lapply() of course.

df1
#   V1 V2 V3
# 1  1  2  1
# 2  1  2  1
# 3  1  1  2
# 4  1  3  2
# 5  2  3  2

am <- matrix(c(1, 2, 3, 3, 2, 1), 3)
am
#      [,1] [,2]
# [1,]    1    3
# [2,]    2    2
# [3,]    3    1

df1[] <- lapply(df1, function(x) am[match(x, am[,1]), 2])
df1
#   V1 V2 V3
# 1  3  2  3
# 2  3  2  3
# 3  3  3  2
# 4  3  1  2
# 5  2  1  2

Data

df1 <- structure(list(V1 = c(1L, 1L, 1L, 1L, 2L), V2 = c(2L, 2L, 1L, 
3L, 3L), V3 = c(1L, 1L, 2L, 2L, 2L)), class = "data.frame", row.names = c(NA, 
-5L))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM