简体   繁体   English

遍历数据帧并根据条件[R]更改值

[英]Iterating through data frame and changing values on condition [R]

Had to make an account because this sequence of for loops has been annoying me for quite some time. 因为这个for循环序列已经困扰了我很长一段时间了,所以不得不记账了。

I have a data frame in R with 1000 rows and 10 columns, with each value ranging from 1:3. 我在R中有一个数据帧,具有1000行10列,每个值的范围为1:3。 I would like to re-code EVERY entry so that: 1==3, 2==2, 3==1. 我想重新编码每个条目,以便:1 == 3,2 == 2,3 == 1。 I understand that there are easier ways to do this, such as sub-setting each column and hard coding the condition, but this isn't always ideal as many of the data sets that I work with have up to 100 columns. 我知道,有更简便的方法可以做到这一点,例如对每个列进行子设置并对条件进行硬编码,但这并不总是理想的,因为我使用的许多数据集最多有100列。

I would like to use a nested loop in order to accomplish this task -- this is what I have thus far: 我想使用嵌套循环来完成此任务-到目前为止,这是我的目的:

for(i in 1:nrow(dat_trans)){
  for(j in length(dat_trans)){
    if(dat_trans[i,j] == 1){
      dat_trans[i,j] <- 3
    } else if(dat_trans[i,j] == 2){
      dat_trans[i,j] <- 2
    } else{
      dat_trans[i,j] <- 1
    }
  }
}

So I iterate through the first column, grab every value and change it based on the if/else's condition, I am still learning R so if you have any pointers in my code, feel free to point it out. 因此,我遍历第一列,获取每个值并根据if / else的条件对其进行更改,但我仍在学习R,因此,如果我的代码中有任何指针,请随时指出。

edit: code 编辑:代码

R is a vectorized language, so you really don't need the inner loop. R是向量化语言,因此您实际上不需要内部循环。
Also if you notice that 4-"old value" = "new value", you can eliminate the if statements. 另外,如果您注意到4-“旧值” =“新值”,则可以消除if语句。

for(i in 1:ncol(dat_trans)){
        dat_trans[,i] <- 4-dat_trans[,i]
}

The outer loop is now iterating across the columns for only 10 iterations as opposed to 1000 for all of rows. 现在,外部循环仅在各列之间进行10次迭代,而不是对所有行进行1000次迭代。 This will greatly improve performance. 这将大大提高性能。

This type of operation is a swap operation. 这种操作是交换操作。 The ways to swap values without for loops are numerous. 不使用for循环交换值的方法很多。

To set up a simple dataframe: 设置一个简单的数据框:

df <- data.frame(
  col1 = c(1,2,3),
  col2 = c(2,3,1),
  col3 = c(3,1,2)
)

Using a dummy value: 使用虚拟值:

df[df==1] <- 4
df[df==3] <- 1
df[df==4] <- 3

Using a temporary variable: 使用临时变量:

dftemp <- df
df[dftemp==1] <- 3
df[dftemp==3] <- 1

Using multiplication/division and addition/subtraction: 使用乘法/除法和加法/减法:

df <- 4 - df

Using Boolean operations: 使用布尔运算:

df <- (df==1) * 3 + (df==2) * 2 + (df==3) * 1

Using a bitwise xor (in case you really have a need for speed): 使用按位异或(以防您确实需要速度):

df[df!=2] <- sapply(df, function(x){bitwXor(2,x)})[df!=2]

If a nested for loop is required the switch function is a good option. 如果需要嵌套的for循环,则switch功能是一个不错的选择。

for(i in seq(ncol(df))){
  for(j in seq(nrow(df))){
    df[j,i] <- switch(df[j,i],3,2,1)
  }
}

Text can be used if the values are not as nicely indexed as 1, 2, and 3. 如果值的索引值不如1、2和3,则可以使用文本。

for(i in seq(ncol(df))){
  for(j in seq(nrow(df))){
    df[j,i] <- switch(as.character(df[j,i]),
                      "1" = 3,
                      "2" = 2,
                      "3" = 1)
  }
}

This sounds like a merge / join operation. 这听起来像merge / join操作。

set.seed(42)
dat_trans <- as.data.frame(
  setNames(lapply(1:3, function(ign) sample(1:3, size=10, replace=TRUE)),
           c("V1", "V2", "V3"))
)
dat_trans
#    V1 V2 V3
# 1   3  2  3
# 2   3  3  1
# 3   1  3  3
# 4   3  1  3
# 5   2  2  1
# 6   2  3  2
# 7   3  3  2
# 8   1  1  3
# 9   2  2  2
# 10  3  2  3

newvals <- data.frame(old = c(1, 3), new = c(3, 1))
newvals
#   old new
# 1   1   3
# 2   3   1

Using dplyr and tidyr : 使用dplyrtidyr

library(dplyr)
library(tidyr) # gather, spread
dat_trans %>%
  mutate(rn = row_number()) %>%
  gather(k, v, -rn) %>%
  left_join(newvals, by = c("v" = "old")) %>%
  mutate(v = if_else(is.na(new), v, new)) %>%
  select(-new) %>%
  spread(k, v) %>%
  select(-rn)
#    V1 V2 V3
# 1   1  2  1
# 2   1  1  3
# 3   3  1  1
# 4   1  3  1
# 5   2  2  3
# 6   2  1  2
# 7   1  1  2
# 8   3  3  1
# 9   2  2  2
# 10  1  2  1

(The need for rn is likely due to my use of an older version of tidyr : I'm at 0.8.2, though 1.0.0 has recently been released. That release did a lot of enhancement/work on spread / gather and introduced the pivot_* functions which are likely much smoother at this. If you have a more recent version, try this without the rn portions.) (对rn的需求可能是由于我使用的是较旧版本的tidyr :我是0.8.2,尽管最近发布了1.0.0。该版本在spread / gather和引入方面做了很多改进/工作,另外, pivot_*函数可能会更顺畅。如果您使用的是更新版本,请尝试不使用rn部分。)


Or a much-more-direct approach using a "recode" mindset: 或者使用“重新编码”思维方式的更直接的方法:

dat_trans[,c("V1", "V2", "V3")] <- lapply(dat_trans[,c("V1", "V2", "V3")], car::recode, "1=3; 3=1")
# or
dat_trans[,c("V1", "V2", "V3")] <- lapply(dat_trans[,c("V1", "V2", "V3")], dplyr::recode, '1' = 3L, '3' = 1L)

You could use an assignment matrix am . 您可以使用分配矩阵am match() each value of an attribute of df1 with column 1 of am but select column 2, then assign it to df1 . 使用am列1 match() df1属性的每个值,但选择列2,然后将其分配给df1 In a lapply() of course. 当然是在lapply()中。

df1
#   V1 V2 V3
# 1  1  2  1
# 2  1  2  1
# 3  1  1  2
# 4  1  3  2
# 5  2  3  2

am <- matrix(c(1, 2, 3, 3, 2, 1), 3)
am
#      [,1] [,2]
# [1,]    1    3
# [2,]    2    2
# [3,]    3    1

df1[] <- lapply(df1, function(x) am[match(x, am[,1]), 2])
df1
#   V1 V2 V3
# 1  3  2  3
# 2  3  2  3
# 3  3  3  2
# 4  3  1  2
# 5  2  1  2

Data 数据

df1 <- structure(list(V1 = c(1L, 1L, 1L, 1L, 2L), V2 = c(2L, 2L, 1L, 
3L, 3L), V3 = c(1L, 1L, 2L, 2L, 2L)), class = "data.frame", row.names = c(NA, 
-5L))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM