简体   繁体   English

MNIST 数据集上的特征归一化

[英]Feature normalization on MNIST dataset

I am working with a subset of MNIST dataset where I want to normalize the features of the samples in the dataset.我正在使用 MNIST 数据集的一个子集,我想在其中规范化数据集中样本的特征。 I am trying to load the dataset in the form of .mat file.我正在尝试以 .mat 文件的形式加载数据集。 Can anyone please guide me on how I can convert .mat to numpy array so I can perform basic operations like mean and std.任何人都可以指导我如何将 .mat 转换为 numpy 数组,以便我可以执行诸如均值和标准差之类的基本操作。 deviation on the feature vectors?特征向量的偏差?

This is my code for loading .mat file and converting to numpy array:这是我加载 .mat 文件并转换为 numpy 数组的代码:

import scipy.io
import numpy as np

train_0 = scipy.io.loadmat('data/training_data_0.mat')
train_1 = scipy.io.loadmat('data/training_data_1.mat')

test_0 = scipy.io.loadmat('data/testing_data_0.mat')
test_1 = scipy.io.loadmat('data/testing_data_1.mat')

# to return a group of the key-value
# pairs in the dictionary
result = train_0.items()

# Convert object to a list
data = list(result)

# Convert list to an array
numpyArray = np.array(data)

print(numpyArray.mean())

However after execution I am getting this error:但是执行后我收到此错误:

  numpyArray = np.array(data)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/mish/Work/ASU/Fall20/CSE 569/main.py", line 20, in <module>
    print(numpyArray.mean())
  File "/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py", line 160, in _mean
    ret = umr_sum(arr, axis, dtype, out, keepdims)
TypeError: can only concatenate str (not "bytes") to str

You passing a list of tuples (key, value) to the numpy.array , you have the numpy array already just use train_0['<some variable name here>']您将元组 (key, value) 列表传递给numpy.array ,您已经拥有 numpy 数组,只需使用train_0['<some variable name here>']

To get the variable names just use: print(train_0.keys())要获取变量名称,只需使用: print(train_0.keys())

This is probably answers your question: Convert loaded mat file back to numpy array这可能会回答您的问题: Convert loaded mat file back to numpy array

The scipy.io.loadmat returns a dictionary: scipy.io.loadmat 返回一个字典:

Returns
    mat_dictdict

        dictionary with variable names as keys, and loaded matrices as values.

https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.loadmat.html https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.loadmat.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM