在 R 中“按部分”绘制矩阵？

Question

I have a 50k by 50k square matrix saved to disk in a text file and I would like to produce a simple histogram to see the distribution of the values in the matrix.我有一个 50k x 50k 方矩阵以文本文件的形式保存到磁盘，我想生成一个简单的直方图来查看矩阵中值的分布。

Obviously, when I try to load the matrix in R by using read.table() , a memory error is encountered as the matrix is too big.显然，当我尝试使用read.table()加载 R 中的矩阵时，由于矩阵太大，会遇到 memory 错误。 Is there anyway I could possibly load smaller submatrices one at a time, but still produce a histogram that considers all the values of the original matrix?无论如何，我是否可以一次加载一个较小的子矩阵，但仍会产生一个考虑原始矩阵所有值的直方图？ I can indeed load smaller submatrices, but I just override the histogram that I had for the last submatrix with the distribution of the new one.我确实可以加载较小的子矩阵，但我只是用新子矩阵的分布覆盖了我对最后一个子矩阵的直方图。

Answer 1

Here's an approach.这是一种方法。 I don't have all the details because you did not provide sample data or the expected output, but one way to do this is through the read_chunked_csv function in the readr package.我没有所有详细信息，因为您没有提供示例数据或预期的 output，但一种方法是通过read_chunked_csv ZEFE90A8E604A7C840E8Z8D03A 中的 read_chunked_csv function First, you will need to write your summarisation function and then apply this to each chunk.首先，您需要编写摘要 function，然后将其应用于每个块。 See the below for a full repex.请参阅下面的完整重复。


# Call the Required Libraries
library(dplyr)
library(ggplot2)
library(readr)

# First Generate Some Fake Data
temp <- tempfile(fileext = ".csv")

fake_dat <- as.data.frame(matrix(rnorm(1000*100), ncol = 100))
write_csv(fake_dat, temp)



# Now write a summarisation function
# This will be applied to each chunk that is read into
# memory
summarise_for_hist <- function(x, pos){
  x %>% 
    mutate(added_bin = cut(V1, breaks = -6:6)) %>% 
    count(added_bin)
}

# Note that I manually set the cutpoints or "breaks"
# argument. You would need to refine this based on your
# data and subject matter expertise

# A

small_read <- read_csv_chunked(temp, # data
                               DataFrameCallback$new(summarise_for_hist),
                               chunk_size = 200 # number of lines to read
                               )

Now that we have summarised our data, we can combine and plot it.现在我们已经总结了我们的数据，我们可以将它与 plot 结合起来。


# Generate our histogram by combining all of the results
# and plotting

small_read %>% 
  group_by(added_bin) %>% 
  summarise(total = sum(n)) %>% 
  ggplot(aes(added_bin, total))+
  geom_col()

This will yield the following:这将产生以下结果：

在 R 中“按部分”绘制矩阵？

问题描述

1 个解决方案

解决方案1
3 2019-11-03 15:53:11

在 R 中“按部分”绘制矩阵？

问题描述

1 个解决方案

解决方案1 3 2019-11-03 15:53:11

解决方案1
3 2019-11-03 15:53:11