Python（numpy）使包含大量數組元素的系統崩潰

Question

我正在嘗試使用scikit提供的許多分類器來構建基本的字符識別模型。 使用的數據集是標准的手寫字母數字樣本集（此來源的 Chars74K圖像數據集：EnglishHnd.tgz）。

每個字符有55個樣本（總共62個字母數字字符），每個樣本均為900x1200像素。 我將矩陣（首先轉換為灰度）展平為1x1080000數組（每個數組代表一個特征）。

for sample in sample_images: # sample images is the list of the .png files
    img = imread(sample);
    img_gray = rgb2gray(img);
    if n == 0 and m == 0: # n and m are global variables
        n, m = np.shape(img_gray);
    img_gray = np.reshape(img_gray, n*m);
    img_gray = np.append(img_gray, sample_id); # sample id stores the label of the training sample
    if len(samples) == 0: # samples is the final numpy ndarray
        samples = np.append(samples, img_gray);
        samples = np.reshape(samples, [1, n*m + 1]);
    else:
        samples = np.append(samples, [img_gray], axis=0);

因此，最終的數據結構應具有55x62陣列，其中每個陣列的容量為1080000個元素。 僅存儲最終結構（中間矩陣的范圍是局部的）。

為了學習該模型，存儲的數據量非常大（我想），因為該程序實際上並沒有進展到一定程度，並且使我的系統崩潰到必須修復BIOS的程度！

到目前為止，該程序僅收集要發送給分類器的數據……分類還沒有引入代碼中。

關於可以采取什么措施更有效地處理數據的任何建議？

注意：我使用numpy來存儲扁平化矩陣的最終結構。 此外，系統具有8Gb RAM。

Answer 1

這似乎是堆棧溢出的情況。 如果我理解您的問題，則您有3,682,800,000個數組元素。 什么是元素類型？ 如果是一個字節，則大約是3 GB的數據，足以填滿堆棧大小（通常約為1兆字節）。 即使只有一點點元素，您仍然需要500 mb。 嘗試使用堆內存（計算機上最多8個演出）

Answer 2

鼓勵我將其發布為解決方案，盡管上面的評論可能更具啟發性。

用戶程序的問題有兩個方面。 確實，它只是壓倒了堆棧。

更常見的是一次處理一個圖像，尤其是在諸如計算機圖形或計算機視覺之類的圖像處理中。 這可以與sklearn一起很好地工作，在sklearn中，您可以在讀取圖像時更新模型。

您可以使用在此堆棧文章中找到的以下代碼：

import os
rootdir = '/path/to/my/pictures'

for subdir, dirs, files in os.walk(rootdir):
    for file in files:
        if file[-3:] == 'png': # or whatever your file type is / some check
             # do your training here
             img = imread(file)

             img_gray = rgb2gray(img)
             if n == 0 and m == 0: # n and m are global variables
                 n, m = np.shape(img_gray);
             img_gray = np.reshape(img_gray, n*m)

             # sample id stores the label of the training sample
             img_gray = np.append(img_gray, sample_id) 

             # samples is the final numpy ndarray
             if len(samples) == 0: 
                 samples = np.append(samples, img_gray);
                 samples = np.reshape(samples, [1, n*m + 1])
             else:
                 samples = np.append(samples, [img_gray], axis=0)

這更多是偽代碼，但是一般流程應該有正確的主意。 讓我知道我還有什么可以做的！ 如果您對一些很棒的深度學習算法感興趣，還請查看OpenCV。 它們是一堆很酷的東西，而圖像則構成了很好的樣本數據。

希望這可以幫助。

Python（numpy）使包含大量數組元素的系統崩潰

問題描述

2 個解決方案

解決方案1
1 已采納 2017-02-28 04:22:27

解決方案2
0 2017-02-28 05:22:12

Python（numpy）使包含大量數組元素的系統崩潰

問題描述

2 個解決方案

解決方案1 1 已采納 2017-02-28 04:22:27

解決方案2 0 2017-02-28 05:22:12

解決方案1
1 已采納 2017-02-28 04:22:27

解決方案2
0 2017-02-28 05:22:12