Loop over columns R

Question

I have a data.frame in R that consists in a number of columns with numerical values. Like this:

   A       B      C
0.6057  0.1644  6.93
0.5723  0.117   6.59
0.5614  0.1552  7.02
0.4102  0.1059  5.24
0.4945  0.0857  6.64
0.5157  0.0747  7.06
0.7785  0.1394  5.21
0.5492  0.1557  6.06
0.5411  0.1884  5.68
0.6622  0.148   6.1

For each of these columns, I want to create a new column containing the quartile values. I have no problem doing it over one column at a time using this formula:

tableOne <- within(data, quartile <-
                    as.integer(cut(A, quantile(A, probs=0:5/5,na.rm=T))))

But as I have 100 columns with different names, I wanted to loop over each column separately.

I tried a loop without success:

for(i in names(data)){
  tableOne <- within(data, quarti <- as.integer(cut(i, quantile(i, probs=0:5/5,na.rm=T))))
}

I get the following error:

Error in cut.default(i, quantile(i, probs = 0:5/5, na.rm = T)) : 
  'x' must be numeric

I also tried apply function:

df.two <- lapply(df, function(x) within(data, quartile <- as.integer(cut(x, quantile(x, probs=0:5/5,na.rm=T)))))

with no success:

Error during wrapup: argument "obj" is missing, with no default
Error during wrapup: target context is not on the stack

Any advices on how to iterate my functions over all the columns and get all results in the same data.frame?

Thanks a lot

Answer 1

See end of answer for a better approach, this one is for easy understanding of the steps.

I'm unsure what you're willing to do, but maybe this:

df2<- as.data.frame( lapply( df, function(x){
  as.integer( cut(x, quantile(x, probs=(0:5)/5, na.rm=T)))
}))
colnames(df2) <- paste0("quartile_",colnames(df))
df3 <- cbind(df,df2)

Which gives:

        A      B    C quartile_A quartile_B quartile_C
1  0.6057 0.1644 6.93          4          5          4
2  0.5723 0.1170 6.59          4          2          3
3  0.5614 0.1552 7.02          3          4          5
4  0.4102 0.1059 5.24         NA          2          1
5  0.4945 0.0857 6.64          1          1          4
6  0.5157 0.0747 7.06          2         NA          5
7  0.7785 0.1394 5.21          5          3         NA
8  0.5492 0.1557 6.06          3          4          2
9  0.5411 0.1884 5.68          2          5          2
10 0.6622 0.1480 6.10          5          3          3

Datas used:

> dput(df)
structure(list(A = c(0.6057, 0.5723, 0.5614, 0.4102, 0.4945, 
0.5157, 0.7785, 0.5492, 0.5411, 0.6622), B = c(0.1644, 0.117, 
0.1552, 0.1059, 0.0857, 0.0747, 0.1394, 0.1557, 0.1884, 0.148
), C = c(6.93, 6.59, 7.02, 5.24, 6.64, 7.06, 5.21, 6.06, 5.68, 
6.1)), .Names = c("A", "B", "C"), class = "data.frame", row.names = c(NA, 
-10L))

As per @DavidArenburg comment below a better way to achieve the same result is:

df[paste0("quartile_",colnames(df))] <- lapply(df, function(x) as.integer(cut(x, quantile(x, probs=(0:5)/5, na.rm = TRUE))))

This avoid creating a new dataframe and copying it over at end.

Loop over columns R

Question

1 answers

solution1
4 ACCPTED 2015-12-15 16:58:44

Loop over columns R

Question

1 answers

solution1 4 ACCPTED 2015-12-15 16:58:44

solution1
4 ACCPTED 2015-12-15 16:58:44