在 R 中“按部分”繪制矩陣？

Question

我有一個 50k x 50k 方矩陣以文本文件的形式保存到磁盤，我想生成一個簡單的直方圖來查看矩陣中值的分布。

顯然，當我嘗試使用read.table()加載 R 中的矩陣時，由於矩陣太大，會遇到 memory 錯誤。 無論如何，我是否可以一次加載一個較小的子矩陣，但仍會產生一個考慮原始矩陣所有值的直方圖？ 我確實可以加載較小的子矩陣，但我只是用新子矩陣的分布覆蓋了我對最后一個子矩陣的直方圖。

Answer 1

這是一種方法。 我沒有所有詳細信息，因為您沒有提供示例數據或預期的 output，但一種方法是通過read_chunked_csv ZEFE90A8E604A7C840E8Z8D03A 中的 read_chunked_csv function 首先，您需要編寫摘要 function，然后將其應用於每個塊。 請參閱下面的完整重復。


# Call the Required Libraries
library(dplyr)
library(ggplot2)
library(readr)

# First Generate Some Fake Data
temp <- tempfile(fileext = ".csv")

fake_dat <- as.data.frame(matrix(rnorm(1000*100), ncol = 100))
write_csv(fake_dat, temp)



# Now write a summarisation function
# This will be applied to each chunk that is read into
# memory
summarise_for_hist <- function(x, pos){
  x %>% 
    mutate(added_bin = cut(V1, breaks = -6:6)) %>% 
    count(added_bin)
}

# Note that I manually set the cutpoints or "breaks"
# argument. You would need to refine this based on your
# data and subject matter expertise

# A

small_read <- read_csv_chunked(temp, # data
                               DataFrameCallback$new(summarise_for_hist),
                               chunk_size = 200 # number of lines to read
                               )

現在我們已經總結了我們的數據，我們可以將它與 plot 結合起來。


# Generate our histogram by combining all of the results
# and plotting

small_read %>% 
  group_by(added_bin) %>% 
  summarise(total = sum(n)) %>% 
  ggplot(aes(added_bin, total))+
  geom_col()

這將產生以下結果：

在 R 中“按部分”繪制矩陣？

問題描述

1 個解決方案

解決方案1
3 2019-11-03 15:53:11

在 R 中“按部分”繪制矩陣？

問題描述

1 個解決方案

解決方案1 3 2019-11-03 15:53:11

解決方案1
3 2019-11-03 15:53:11