简体   繁体   English

将 function 应用于 R 中每 n 列的每一行

[英]Applying a function to every row on each n number of columns in R

My data contains consecutive columns 1,2,...,2000.我的数据包含连续的列 1,2,...,2000。 I want to apply a functions that returns a 3 vars for each group of 100 columns for each row.我想应用一个函数,该函数为每行的每组 100 列返回 3 个变量。

The data look like this:数据如下所示:

  1       2        3    .....   2000  
0.01    0.0       0.002         0.03
0.005   0.002     0.011         0.04
0.001   0.003     0.004         0.0

Here is the code I tried:这是我试过的代码:

prep_data <- function(df){
  #Create Column names
  colnms<-c()
  for(i in seq(1, 20, 1)){
    
    for(j in seq(1, 3, 1)){
      f<-paste0("grp",i,"_",j)
      colnms=c(colnms,f)
    }
    
  }
  #
  trans <- data.frame(matrix(ncol = 60, nrow = NROW(df)))
  colnames(trans) <-colnms

#Looping over every row
  for (i in 1:NROW(df)){
      X = c()
      #LOOPING over each group of 100 columns
      for(j in seq(1, 1900, 100)){
        end<-j+99
        tmp<-subset(df[i], select=j:end)
        #Here I apply the function over the 100 columns for the current row to get 3 values#
          X = c(X,MY_FUNC(tmp))
         ###################################################################################          
}
      }
#Append the current row
      trans[i,] <- X
  }
  return(trans)
  
}

The expected output (A dataframe of 60 columns) is as follows:预期的output(60列的A dataframe)如下:

grp1_1  grp1_2    grp1_3 .....  grp20_3  
0.01    0.0       0.002         0.03
0.005   0.002     0.011         0.04
0.001   0.003     0.004         0.0

My code runs but its too slow probably because it's not efficient with all the loops我的代码运行但速度太慢可能是因为它对所有循环都没有效率

Thanks in advance提前致谢

Here is one approach:这是一种方法:

Let d be your 3 rows x 2000 columns frame, with column names as.character(1:2000) (See below for generation of fake data).假设d是您的 3 行 x 2000 列框架,列名为as.character(1:2000) (请参阅下面的假数据生成)。 We add a row identifier using .I , then melt the data long, adding grp , and column-group identifier (ie identifying the 20 sets of 100).我们使用.I添加行标识符,然后融化数据 long,添加grp和列组标识符(即标识 20 组 100)。 Then apply your function myfunc (see below for stand-in function for this example), by row and group, and swing wide.然后按行和组应用 function myfunc (请参阅下面的替代 function),宽 swing。 (I used stringr::str_pad to add 0 to the front of the group number) (我用stringr::str_pad在组号前面加了0)

# add row identifier
d[, row:=.I]

# melt and add col group identifier
dm = melt(d,id.vars = "row",variable.factor = F)[,variable:=as.numeric(variable)][order(variable,row), grp:=rep(1:20, each=300)]

# get the result (180 rows long), applying myfync to each set of columns, by row
result = dm[, myfunc(value), by=.(row,grp)][,frow:=rep(1:3,times=60)]

# swing wide (3 rows long, 60 columns wide)
dcast(
  result[,v:=paste0("grp",stringr::str_pad(grp,2,pad = "0"),"_",row)],
  frow~v,value.var="V1"
  )[, frow:=NULL][]

Output: (first six columns only) Output:(仅前六列)

      grp01_1    grp01_2    grp01_3    grp02_1    grp02_2    grp02_3
        <num>      <num>      <num>      <num>      <num>      <num>
1: 0.54187168 0.47650694 0.48045694 0.51278399 0.51777319 0.46607845
2: 0.06671367 0.08763655 0.08076939 0.07930063 0.09830116 0.07807937
3: 0.25828989 0.29603471 0.28419957 0.28160367 0.31353016 0.27942687

Input:输入:

d = data.table()
alloc.col(d,2000)
set.seed(123)
for(c in 1:2000)  set(d,j=as.character(c), value=runif(3))

myfunc Function (toy example for this answer): myfunc Function(这个答案的玩具示例):

myfunc <- function(x) c(mean(x), var(x), sd(x))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM