[英]Applying a function to every row on each n number of columns in R
My data contains consecutive columns 1,2,...,2000.我的数据包含连续的列 1,2,...,2000。 I want to apply a functions that returns a 3 vars for each group of 100 columns for each row.我想应用一个函数,该函数为每行的每组 100 列返回 3 个变量。
The data look like this:数据如下所示:
1 2 3 ..... 2000
0.01 0.0 0.002 0.03
0.005 0.002 0.011 0.04
0.001 0.003 0.004 0.0
Here is the code I tried:这是我试过的代码:
prep_data <- function(df){
#Create Column names
colnms<-c()
for(i in seq(1, 20, 1)){
for(j in seq(1, 3, 1)){
f<-paste0("grp",i,"_",j)
colnms=c(colnms,f)
}
}
#
trans <- data.frame(matrix(ncol = 60, nrow = NROW(df)))
colnames(trans) <-colnms
#Looping over every row
for (i in 1:NROW(df)){
X = c()
#LOOPING over each group of 100 columns
for(j in seq(1, 1900, 100)){
end<-j+99
tmp<-subset(df[i], select=j:end)
#Here I apply the function over the 100 columns for the current row to get 3 values#
X = c(X,MY_FUNC(tmp))
###################################################################################
}
}
#Append the current row
trans[i,] <- X
}
return(trans)
}
The expected output (A dataframe of 60 columns) is as follows:预期的output(60列的A dataframe)如下:
grp1_1 grp1_2 grp1_3 ..... grp20_3
0.01 0.0 0.002 0.03
0.005 0.002 0.011 0.04
0.001 0.003 0.004 0.0
My code runs but its too slow probably because it's not efficient with all the loops我的代码运行但速度太慢可能是因为它对所有循环都没有效率
Thanks in advance提前致谢
Here is one approach:这是一种方法:
Let d
be your 3 rows x 2000 columns frame, with column names as.character(1:2000)
(See below for generation of fake data).假设d
是您的 3 行 x 2000 列框架,列名为as.character(1:2000)
(请参阅下面的假数据生成)。 We add a row identifier using .I
, then melt the data long, adding grp
, and column-group identifier (ie identifying the 20 sets of 100).我们使用.I
添加行标识符,然后融化数据 long,添加grp
和列组标识符(即标识 20 组 100)。 Then apply your function myfunc
(see below for stand-in function for this example), by row and group, and swing wide.然后按行和组应用 function myfunc
(请参阅下面的替代 function),宽 swing。 (I used stringr::str_pad
to add 0 to the front of the group number) (我用stringr::str_pad
在组号前面加了0)
# add row identifier
d[, row:=.I]
# melt and add col group identifier
dm = melt(d,id.vars = "row",variable.factor = F)[,variable:=as.numeric(variable)][order(variable,row), grp:=rep(1:20, each=300)]
# get the result (180 rows long), applying myfync to each set of columns, by row
result = dm[, myfunc(value), by=.(row,grp)][,frow:=rep(1:3,times=60)]
# swing wide (3 rows long, 60 columns wide)
dcast(
result[,v:=paste0("grp",stringr::str_pad(grp,2,pad = "0"),"_",row)],
frow~v,value.var="V1"
)[, frow:=NULL][]
Output: (first six columns only) Output:(仅前六列)
grp01_1 grp01_2 grp01_3 grp02_1 grp02_2 grp02_3
<num> <num> <num> <num> <num> <num>
1: 0.54187168 0.47650694 0.48045694 0.51278399 0.51777319 0.46607845
2: 0.06671367 0.08763655 0.08076939 0.07930063 0.09830116 0.07807937
3: 0.25828989 0.29603471 0.28419957 0.28160367 0.31353016 0.27942687
Input:输入:
d = data.table()
alloc.col(d,2000)
set.seed(123)
for(c in 1:2000) set(d,j=as.character(c), value=runif(3))
myfunc
Function (toy example for this answer): myfunc
Function(这个答案的玩具示例):
myfunc <- function(x) c(mean(x), var(x), sd(x))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.