简体   繁体   English

分成垃圾箱R

[英]Divide into bins R

I have the following data 我有以下数据

A   1   6
A   2   72
A   3   90
A   4   81
A   5   81
A   6   42
A   7   12
A   8   32
A   9   34
A   10  92
B   1   44
B   2   54
B   3   10
B   4   21
B   5   47
B   6   35
B   7   94
B   8   5
B   9   35
B   10  77
B   11  9
B   12  52
B   13  73
B   14  93
B   15  38
B   16  85
B   17  90
B   18  47

My output must be 我的输出必须是

A   1   6   1
A   2   72  1
A   3   90  2
A   4   81  2
A   5   81  3
A   6   42  3
A   7   12  4
A   8   32  4
A   9   34  5
A   10  92  5
B   1   44  1
B   2   54  1
B   3   10  1
B   4   21  1
B   5   47  1
B   6   35  2
B   7   94  2
B   8   5   2
B   9   35  2
B   10  77  3
B   11  9   3
B   12  52  3
B   13  73  3
B   14  93  4
B   15  38  4
B   16  85  4
B   17  90  4
B   18  47  4

The bin(last) column must be calculated based on the length of the item in the first column. bin(last)列必须根据第一列中项目的长度来计算。 So for A= 10/5 = 2 in each bin 因此对于每个箱中的A = 10/5 = 2

For B, 18/5 = 3.6 in each bin.... 对于B,每个仓中18/5 = 3.6 ....

I tried using seq bin = seq(from=, to=, by=) But not sure how to proceed. 我尝试使用seq bin = seq(from =,to =,by =),但是不确定如何继续。 Any help would be appreciated. 任何帮助,将不胜感激。 Thank you 谢谢

You can follow the approach from here , using ave to apply the function for each group in your data. 您可以按照此处的方法进行操作,使用ave将功能应用于数据中的每个组。

cbind(dat, bin=ave(dat$V2, dat$V1, FUN=function(x) ceiling(seq_along(x)/length(x)*5)))
#    V1 V2 V3 bin
# 1   A  1  6   1
# 2   A  2 72   1
# 3   A  3 90   2
# 4   A  4 81   2
# 5   A  5 81   3
# 6   A  6 42   3
# 7   A  7 12   4
# 8   A  8 32   4
# 9   A  9 34   5
# 10  A 10 92   5
# 11  B  1 44   1
# 12  B  2 54   1
# 13  B  3 10   1
# 14  B  4 21   2
# 15  B  5 47   2
# 16  B  6 35   2
# 17  B  7 94   2
# 18  B  8  5   3
# 19  B  9 35   3
# 20  B 10 77   3
# 21  B 11  9   4
# 22  B 12 52   4
# 23  B 13 73   4
# 24  B 14 93   4
# 25  B 15 38   5
# 26  B 16 85   5
# 27  B 17 90   5
# 28  B 18 47   5

Using data.table : 使用data.table

setDT(x)[,output:=ceiling(5*(1:.N)/.N),by=V1]
> x
    V1 V2 V3 output
 1:  A  1  6      1
 2:  A  2 72      1
 3:  A  3 90      2
 4:  A  4 81      2
 5:  A  5 81      3
 6:  A  6 42      3
 7:  A  7 12      4
 8:  A  8 32      4
 9:  A  9 34      5
10:  A 10 92      5
11:  B  1 44      1
12:  B  2 54      1
13:  B  3 10      1
14:  B  4 21      2
15:  B  5 47      2
16:  B  6 35      2
17:  B  7 94      2
18:  B  8  5      3
19:  B  9 35      3
20:  B 10 77      3
21:  B 11  9      4
22:  B 12 52      4
23:  B 13 73      4
24:  B 14 93      4
25:  B 15 38      5
26:  B 16 85      5
27:  B 17 90      5
28:  B 18 47      5
    V1 V2 V3 output

I tried this 我试过了

split(df,df$Gene) -> gene
    gene[1] -> g
    as.data.frame(g) ->g1


    FindBin = function(data){
    START=0
    END=length(g1$A.Base)
    noOfBin=20
    jump=END/noOfBin
    bin = seq(from=START, to=END, by=jump)
    g1$bin_index = findInterval(g1$A.Base, bin)
    }
    g1$m1bin=FindBin(g1)

Now, I get the bins.. but since I have split the df into different genes, how to run this over all the split df 现在,我得到了垃圾箱..但是由于我已将df分为不同的基因,如何在所有分割的df上运行

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM