[英]Divide into bins R
I have the following data 我有以下数据
A 1 6 A 2 72 A 3 90 A 4 81 A 5 81 A 6 42 A 7 12 A 8 32 A 9 34 A 10 92 B 1 44 B 2 54 B 3 10 B 4 21 B 5 47 B 6 35 B 7 94 B 8 5 B 9 35 B 10 77 B 11 9 B 12 52 B 13 73 B 14 93 B 15 38 B 16 85 B 17 90 B 18 47
My output must be 我的输出必须是
A 1 6 1 A 2 72 1 A 3 90 2 A 4 81 2 A 5 81 3 A 6 42 3 A 7 12 4 A 8 32 4 A 9 34 5 A 10 92 5 B 1 44 1 B 2 54 1 B 3 10 1 B 4 21 1 B 5 47 1 B 6 35 2 B 7 94 2 B 8 5 2 B 9 35 2 B 10 77 3 B 11 9 3 B 12 52 3 B 13 73 3 B 14 93 4 B 15 38 4 B 16 85 4 B 17 90 4 B 18 47 4
The bin(last) column must be calculated based on the length of the item in the first column. bin(last)列必须根据第一列中项目的长度来计算。 So for A= 10/5 = 2 in each bin 因此对于每个箱中的A = 10/5 = 2
For B, 18/5 = 3.6 in each bin.... 对于B,每个仓中18/5 = 3.6 ....
I tried using seq bin = seq(from=, to=, by=) But not sure how to proceed. 我尝试使用seq bin = seq(from =,to =,by =),但是不确定如何继续。 Any help would be appreciated. 任何帮助,将不胜感激。 Thank you 谢谢
You can follow the approach from here , using ave
to apply the function for each group in your data. 您可以按照此处的方法进行操作,使用ave
将功能应用于数据中的每个组。
cbind(dat, bin=ave(dat$V2, dat$V1, FUN=function(x) ceiling(seq_along(x)/length(x)*5)))
# V1 V2 V3 bin
# 1 A 1 6 1
# 2 A 2 72 1
# 3 A 3 90 2
# 4 A 4 81 2
# 5 A 5 81 3
# 6 A 6 42 3
# 7 A 7 12 4
# 8 A 8 32 4
# 9 A 9 34 5
# 10 A 10 92 5
# 11 B 1 44 1
# 12 B 2 54 1
# 13 B 3 10 1
# 14 B 4 21 2
# 15 B 5 47 2
# 16 B 6 35 2
# 17 B 7 94 2
# 18 B 8 5 3
# 19 B 9 35 3
# 20 B 10 77 3
# 21 B 11 9 4
# 22 B 12 52 4
# 23 B 13 73 4
# 24 B 14 93 4
# 25 B 15 38 5
# 26 B 16 85 5
# 27 B 17 90 5
# 28 B 18 47 5
Using data.table
: 使用data.table
:
setDT(x)[,output:=ceiling(5*(1:.N)/.N),by=V1]
> x
V1 V2 V3 output
1: A 1 6 1
2: A 2 72 1
3: A 3 90 2
4: A 4 81 2
5: A 5 81 3
6: A 6 42 3
7: A 7 12 4
8: A 8 32 4
9: A 9 34 5
10: A 10 92 5
11: B 1 44 1
12: B 2 54 1
13: B 3 10 1
14: B 4 21 2
15: B 5 47 2
16: B 6 35 2
17: B 7 94 2
18: B 8 5 3
19: B 9 35 3
20: B 10 77 3
21: B 11 9 4
22: B 12 52 4
23: B 13 73 4
24: B 14 93 4
25: B 15 38 5
26: B 16 85 5
27: B 17 90 5
28: B 18 47 5
V1 V2 V3 output
I tried this 我试过了
split(df,df$Gene) -> gene gene[1] -> g as.data.frame(g) ->g1 FindBin = function(data){ START=0 END=length(g1$A.Base) noOfBin=20 jump=END/noOfBin bin = seq(from=START, to=END, by=jump) g1$bin_index = findInterval(g1$A.Base, bin) } g1$m1bin=FindBin(g1)
Now, I get the bins.. but since I have split the df into different genes, how to run this over all the split df 现在,我得到了垃圾箱..但是由于我已将df分为不同的基因,如何在所有分割的df上运行
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.