简体   繁体   中英

how to loop through columns in R

I have a very large data set including 250 string and numeric variables. I want to compare one after another columns together. For example, I am going to compare (difference) the first variable with second one, third one with fourth one, fifth one with sixth one and so on.
For example (The structure of the data set is something like this example), I want to compare number.x with number.y, day.x with day.y, school.x with school.y and etc.

number.x<-c(1,2,3,4,5,6,7)
number.y<-c(3,4,5,6,1,2,7)
day.x<-c(1,3,4,5,6,7,8)
day.y<-c(4,5,6,7,8,7,8)
school.x<-c("a","b","b","c","n","f","h")
school.y<-c("a","b","b","c","m","g","h")
city.x<- c(1,2,3,7,5,8,7)
city.y<- c(1,2,3,5,5,7,7) 

You mean, something like this?

> number.x == number.y
[1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
> length(which(number.x==number.y))
[1] 1
> school.x == school.y
[1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE
> test.day <- day.x == day.y
> test.day
[1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE

EDIT : Given your example variables above, we have:

df <- data.frame(number.x,
             number.y,
             day.x,
             day.y,
             school.x,
             school.y,
             city.x,
             city.y,
             stringsAsFactors=FALSE)

n <- ncol(df)  # no of columns (assumed EVEN number)

k <- 1
comp <- list()  # comparisons will be stored here

while (k <= n-1) {
      l <- (k+1)/2
      comp[[l]] <- df[,k] == df[,k+1]
      k <- k+2
}

After which, you'll have:

> comp
[[1]]
[1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE

[[2]]
[1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE

[[3]]
[1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE

[[4]]
[1]  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE

To get the comparison result between columns k and k+1 , you look at the (k+1)/2 element of comp - ie to get the comparison results between columns 7 & 8, you look at the comp element 8/2=4 :

> comp[[4]]
[1]  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE

EDIT 2 : To have the comparisons as new columns in the dataframe:

new.names <- rep('', n/2)
for (i in 1:(n/2)) {
     new.names[i] <- paste0('V', i)
}

cc <- as.data.frame(comp, optional=TRUE)
names(cc) <- new.names

df.new <- cbind(df, cc)

After which, you have:

> df.new
  number.x number.y day.x day.y school.x school.y city.x city.y    V1    V2    V3    V4
1        1        3     1     4        a        a      1      1 FALSE FALSE  TRUE  TRUE
2        2        4     3     5        b        b      2      2 FALSE FALSE  TRUE  TRUE
3        3        5     4     6        b        b      3      3 FALSE FALSE  TRUE  TRUE
4        4        6     5     7        c        c      7      5 FALSE FALSE  TRUE FALSE
5        5        1     6     8        n        m      5      5 FALSE FALSE FALSE  TRUE
6        6        2     7     7        f        g      8      7 FALSE  TRUE FALSE FALSE
7        7        7     8     8        h        h      7      7  TRUE  TRUE  TRUE  TRUE

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM