简体   繁体   中英

How to count number of instances above a value within a given range in R?

I have a rather large dataset looking at SNPs across an entire genome. I am trying to generate a heatmap that scales based on how many SNPs have a BF (bayes factor) value over 50 within a sliding window of x base pairs across the genome. For example, there might be 5 SNPs of interest within the first 1,000,000 base pairs, and then 3 in the next 1,000,000, and so on until I reach the end of the genome, which would be used to generate a single row heatmap. Currently, my data are set out like so:

SNP BF  BP
0001_107388 11.62814713 107388
0001_193069 2.333472447 193069
0001_278038 51.34452334 278038
0001_328786 5.321968927 328786
0001_523879 50.03245434 523879
0001_804477 -0.51777189 804477
0001_990357 6.235452787 990357
0001_1033297    3.08206707  1033297
0001_1167609    -2.427835577    1167609
0001_1222410    52.96447989 1222410
0001_1490205    10.98099565 1490205
0001_1689133    3.75363951  1689133
0001_1746080    3.519987207 1746080
0001_1746450    -2.86666016 1746450
0001_1777011    0.166999413 1777011
0001_2114817    3.266942137 2114817
0001_2232084    50.43561123 2232084
0001_2332903    -0.15022324 2332903
0001_2347062    -1.209000033    2347062
0001_2426273    1.230915683 2426273

where SNP = the SNP ID, BF = the bayes factor, and BP = the position on the genome (I've fudged a couple of > 50 values in there for the data to be suitable for this example).

The issue is that I don't have a SNP for each genome position, otherwise I could simply split the windows of interest based on line count and then count however many lines in the BF column are over 50. Is there any way I can I count the number of SNPs of interest within different windows of the genome positions? Preferably in R, but no issues with using other languages like Python or Bash if it gets the job done.

Thanks!

library(slider); library(dplyr)
my_data %>%
  mutate(count = slide_index(BF, BP, ~sum(.x > 50), .before = 999999))

This counts how many BF > 50 in the window of the last 1M in BP.

            SNP         BF      BP count
1   0001_107388 11.6281471  107388     0
2   0001_193069  2.3334724  193069     0
3   0001_278038 51.3445233  278038     1
4   0001_328786  5.3219689  328786     1
5   0001_523879 50.0324543  523879     2
6   0001_804477 -0.5177719  804477     2
7   0001_990357  6.2354528  990357     2
8  0001_1033297  3.0820671 1033297     2
9  0001_1167609 -2.4278356 1167609     2
10 0001_1222410 52.9644799 1222410     3
11 0001_1490205 10.9809957 1490205     2
12 0001_1689133  3.7536395 1689133     1
13 0001_1746080  3.5199872 1746080     1
14 0001_1746450 -2.8666602 1746450     1
15 0001_1777011  0.1669994 1777011     1
16 0001_2114817  3.2669421 2114817     1
17 0001_2232084 50.4356112 2232084     1
18 0001_2332903 -0.1502232 2332903     1
19 0001_2347062 -1.2090000 2347062     1
20 0001_2426273  1.2309157 2426273     1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM