遍历数据帧并根据条件[R]更改值

Question

Had to make an account because this sequence of for loops has been annoying me for quite some time. 因为这个for循环序列已经困扰了我很长一段时间了，所以不得不记账了。

I have a data frame in R with 1000 rows and 10 columns, with each value ranging from 1:3. 我在R中有一个数据帧，具有1000行10列，每个值的范围为1：3。 I would like to re-code EVERY entry so that: 1==3, 2==2, 3==1. 我想重新编码每个条目，以便：1 == 3，2 == 2，3 == 1。 I understand that there are easier ways to do this, such as sub-setting each column and hard coding the condition, but this isn't always ideal as many of the data sets that I work with have up to 100 columns. 我知道，有更简便的方法可以做到这一点，例如对每个列进行子设置并对条件进行硬编码，但这并不总是理想的，因为我使用的许多数据集最多有100列。

I would like to use a nested loop in order to accomplish this task -- this is what I have thus far: 我想使用嵌套循环来完成此任务-到目前为止，这是我的目的：

for(i in 1:nrow(dat_trans)){
  for(j in length(dat_trans)){
    if(dat_trans[i,j] == 1){
      dat_trans[i,j] <- 3
    } else if(dat_trans[i,j] == 2){
      dat_trans[i,j] <- 2
    } else{
      dat_trans[i,j] <- 1
    }
  }
}

So I iterate through the first column, grab every value and change it based on the if/else's condition, I am still learning R so if you have any pointers in my code, feel free to point it out. 因此，我遍历第一列，获取每个值并根据if / else的条件对其进行更改，但我仍在学习R，因此，如果我的代码中有任何指针，请随时指出。

edit: code 编辑：代码

Answer 1

R is a vectorized language, so you really don't need the inner loop. R是向量化语言，因此您实际上不需要内部循环。
Also if you notice that 4-"old value" = "new value", you can eliminate the if statements. 另外，如果您注意到4-“旧值” =“新值”，则可以消除if语句。

for(i in 1:ncol(dat_trans)){
        dat_trans[,i] <- 4-dat_trans[,i]
}

The outer loop is now iterating across the columns for only 10 iterations as opposed to 1000 for all of rows. 现在，外部循环仅在各列之间进行10次迭代，而不是对所有行进行1000次迭代。 This will greatly improve performance. 这将大大提高性能。

Answer 2

This type of operation is a swap operation. 这种操作是交换操作。 The ways to swap values without for loops are numerous. 不使用for循环交换值的方法很多。

To set up a simple dataframe: 设置一个简单的数据框：

df <- data.frame(
  col1 = c(1,2,3),
  col2 = c(2,3,1),
  col3 = c(3,1,2)
)

Using a dummy value: 使用虚拟值：

df[df==1] <- 4
df[df==3] <- 1
df[df==4] <- 3

Using a temporary variable: 使用临时变量：

dftemp <- df
df[dftemp==1] <- 3
df[dftemp==3] <- 1

Using multiplication/division and addition/subtraction: 使用乘法/除法和加法/减法：

df <- 4 - df

Using Boolean operations: 使用布尔运算：

df <- (df==1) * 3 + (df==2) * 2 + (df==3) * 1

Using a bitwise xor (in case you really have a need for speed): 使用按位异或（以防您确实需要速度）：

df[df!=2] <- sapply(df, function(x){bitwXor(2,x)})[df!=2]

If a nested for loop is required the switch function is a good option. 如果需要嵌套的for循环，则switch功能是一个不错的选择。

for(i in seq(ncol(df))){
  for(j in seq(nrow(df))){
    df[j,i] <- switch(df[j,i],3,2,1)
  }
}

Text can be used if the values are not as nicely indexed as 1, 2, and 3. 如果值的索引值不如1、2和3，则可以使用文本。

for(i in seq(ncol(df))){
  for(j in seq(nrow(df))){
    df[j,i] <- switch(as.character(df[j,i]),
                      "1" = 3,
                      "2" = 2,
                      "3" = 1)
  }
}

Answer 3

This sounds like a merge / join operation. 这听起来像merge / join操作。

set.seed(42)
dat_trans <- as.data.frame(
  setNames(lapply(1:3, function(ign) sample(1:3, size=10, replace=TRUE)),
           c("V1", "V2", "V3"))
)
dat_trans
#    V1 V2 V3
# 1   3  2  3
# 2   3  3  1
# 3   1  3  3
# 4   3  1  3
# 5   2  2  1
# 6   2  3  2
# 7   3  3  2
# 8   1  1  3
# 9   2  2  2
# 10  3  2  3

newvals <- data.frame(old = c(1, 3), new = c(3, 1))
newvals
#   old new
# 1   1   3
# 2   3   1

Using dplyr and tidyr : 使用dplyr和tidyr ：

library(dplyr)
library(tidyr) # gather, spread
dat_trans %>%
  mutate(rn = row_number()) %>%
  gather(k, v, -rn) %>%
  left_join(newvals, by = c("v" = "old")) %>%
  mutate(v = if_else(is.na(new), v, new)) %>%
  select(-new) %>%
  spread(k, v) %>%
  select(-rn)
#    V1 V2 V3
# 1   1  2  1
# 2   1  1  3
# 3   3  1  1
# 4   1  3  1
# 5   2  2  3
# 6   2  1  2
# 7   1  1  2
# 8   3  3  1
# 9   2  2  2
# 10  1  2  1

(The need for rn is likely due to my use of an older version of tidyr : I'm at 0.8.2, though 1.0.0 has recently been released. That release did a lot of enhancement/work on spread / gather and introduced the pivot_* functions which are likely much smoother at this. If you have a more recent version, try this without the rn portions.) （对rn的需求可能是由于我使用的是较旧版本的tidyr ：我是0.8.2，尽管最近发布了1.0.0。该版本在spread / gather和引入方面做了很多改进/工作，另外， pivot_*函数可能会更顺畅。如果您使用的是更新版本，请尝试不使用rn部分。）

Or a much-more-direct approach using a "recode" mindset: 或者使用“重新编码”思维方式的更直接的方法：

dat_trans[,c("V1", "V2", "V3")] <- lapply(dat_trans[,c("V1", "V2", "V3")], car::recode, "1=3; 3=1")
# or
dat_trans[,c("V1", "V2", "V3")] <- lapply(dat_trans[,c("V1", "V2", "V3")], dplyr::recode, '1' = 3L, '3' = 1L)

Answer 4

You could use an assignment matrix am . 您可以使用分配矩阵am 。 match() each value of an attribute of df1 with column 1 of am but select column 2, then assign it to df1 . 使用am列1 match() df1属性的每个值，但选择列2，然后将其分配给df1 。 In a lapply() of course. 当然是在lapply()中。

df1
#   V1 V2 V3
# 1  1  2  1
# 2  1  2  1
# 3  1  1  2
# 4  1  3  2
# 5  2  3  2

am <- matrix(c(1, 2, 3, 3, 2, 1), 3)
am
#      [,1] [,2]
# [1,]    1    3
# [2,]    2    2
# [3,]    3    1

df1[] <- lapply(df1, function(x) am[match(x, am[,1]), 2])
df1
#   V1 V2 V3
# 1  3  2  3
# 2  3  2  3
# 3  3  3  2
# 4  3  1  2
# 5  2  1  2

Data 数据

df1 <- structure(list(V1 = c(1L, 1L, 1L, 1L, 2L), V2 = c(2L, 2L, 1L, 
3L, 3L), V3 = c(1L, 1L, 2L, 2L, 2L)), class = "data.frame", row.names = c(NA, 
-5L))

遍历数据帧并根据条件[R]更改值

问题描述

4 个解决方案

解决方案1
2 已采纳 2019-09-18 17:19:46

解决方案2
1 2019-09-18 18:49:10

解决方案3
0 2019-09-18 17:21:14

解决方案4
0 2019-09-18 18:08:16

Data 数据

遍历数据帧并根据条件[R]更改值

问题描述

4 个解决方案

解决方案1 2 已采纳 2019-09-18 17:19:46

解决方案2 1 2019-09-18 18:49:10

解决方案3 0 2019-09-18 17:21:14

解决方案4 0 2019-09-18 18:08:16

Data 数据

解决方案1
2 已采纳 2019-09-18 17:19:46

解决方案2
1 2019-09-18 18:49:10

解决方案3
0 2019-09-18 17:21:14

解决方案4
0 2019-09-18 18:08:16