简体   繁体   中英

Loop over columns R

I have a data.frame in R that consists in a number of columns with numerical values. Like this:

   A       B      C
0.6057  0.1644  6.93
0.5723  0.117   6.59
0.5614  0.1552  7.02
0.4102  0.1059  5.24
0.4945  0.0857  6.64
0.5157  0.0747  7.06
0.7785  0.1394  5.21
0.5492  0.1557  6.06
0.5411  0.1884  5.68
0.6622  0.148   6.1

For each of these columns, I want to create a new column containing the quartile values. I have no problem doing it over one column at a time using this formula:

tableOne <- within(data, quartile <-
                    as.integer(cut(A, quantile(A, probs=0:5/5,na.rm=T))))

But as I have 100 columns with different names, I wanted to loop over each column separately.

I tried a loop without success:

for(i in names(data)){
  tableOne <- within(data, quarti <- as.integer(cut(i, quantile(i, probs=0:5/5,na.rm=T))))
}

I get the following error:

Error in cut.default(i, quantile(i, probs = 0:5/5, na.rm = T)) : 
  'x' must be numeric

I also tried apply function:

df.two <- lapply(df, function(x) within(data, quartile <- as.integer(cut(x, quantile(x, probs=0:5/5,na.rm=T)))))

with no success:

Error during wrapup: argument "obj" is missing, with no default
Error during wrapup: target context is not on the stack

Any advices on how to iterate my functions over all the columns and get all results in the same data.frame?

Thanks a lot

See end of answer for a better approach, this one is for easy understanding of the steps.

I'm unsure what you're willing to do, but maybe this:

df2<- as.data.frame( lapply( df, function(x){
  as.integer( cut(x, quantile(x, probs=(0:5)/5, na.rm=T)))
}))
colnames(df2) <- paste0("quartile_",colnames(df))
df3 <- cbind(df,df2)

Which gives:

        A      B    C quartile_A quartile_B quartile_C
1  0.6057 0.1644 6.93          4          5          4
2  0.5723 0.1170 6.59          4          2          3
3  0.5614 0.1552 7.02          3          4          5
4  0.4102 0.1059 5.24         NA          2          1
5  0.4945 0.0857 6.64          1          1          4
6  0.5157 0.0747 7.06          2         NA          5
7  0.7785 0.1394 5.21          5          3         NA
8  0.5492 0.1557 6.06          3          4          2
9  0.5411 0.1884 5.68          2          5          2
10 0.6622 0.1480 6.10          5          3          3

Datas used:

> dput(df)
structure(list(A = c(0.6057, 0.5723, 0.5614, 0.4102, 0.4945, 
0.5157, 0.7785, 0.5492, 0.5411, 0.6622), B = c(0.1644, 0.117, 
0.1552, 0.1059, 0.0857, 0.0747, 0.1394, 0.1557, 0.1884, 0.148
), C = c(6.93, 6.59, 7.02, 5.24, 6.64, 7.06, 5.21, 6.06, 5.68, 
6.1)), .Names = c("A", "B", "C"), class = "data.frame", row.names = c(NA, 
-10L))

As per @DavidArenburg comment below a better way to achieve the same result is:

df[paste0("quartile_",colnames(df))] <- lapply(df, function(x) as.integer(cut(x, quantile(x, probs=(0:5)/5, na.rm = TRUE))))

This avoid creating a new dataframe and copying it over at end.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM