简体   繁体   中英

Plotting a million points in R?

i have a text file (tab delimited) and it has 3 columns A, B, C:

       A                          B                           C
0.07142857142857142      0.35714285714285715    0.21428571428571427
0.0                      0.3333333333333333     0.3888888888888889
0.07142857142857142      0.35714285714285715    0.21428571428571427
0.0                      0.3333333333333333         0.3888888888888889

Each row represents a sample with 3 different percentages A, B and C. In total I have 4 files for 4 different organisms. There can be more than a million rows per file.

My idea is to plot each row in order to see the distribution of the pairs of points (A,B,C) in a given file and then to determine what is the most frequent pair in a given file and then compare the 4 files.

I tried plotting these points in R (multi-curves in a same graph: A, B, C in the y axis and the number of sample in the x axis) for each file but there are so many points that basically the graph can't be interpreted. Also for the million rows file, R crashes and won't plot the points.

What would be the best approach to represent these points? Also is the mode function enough to determine the most frequent pair (A,B,C) or is there any appropriate statistic test I could try to do so?

Any help would be much appreciated.

Thanks.

As I mentioned in my comment, clustering may be a solution to your problem. Here is one way of clustering using kmeans :

irisCl <- transform(iris, Cluster = kmeans(iris[1:4],3)$cluster)
library(ggplot2)
qplot(Sepal.Length, Sepal.Width, data=irisCl, colour=Species) + facet_grid(~Cluster)

kmeans

Note that we have clustered in a 4-dimensional variable space. As you can see, the setosa are identified correctly in the first cluster, the second cluster contains only virginica, but the third cluster contains a mixture of versicolor and virginica.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM