mean（）收到了意外的关键字参数'dtype'！

Question

I am trying to implement image classification using Intel Bigdl. 我正在尝试使用Intel Bigdl实现图像分类。 It is using mnist dataset for classification. 它使用mnist数据集进行分类。 Since, I don't want to use the mnist dataset I wrote the alternative approach to it as below: 因为，我不想使用mnist数据集，所以我为它编写了另一种方法，如下所示：

Image Utils.py 图片实用工具

from StringIO import StringIO
from PIL import Image
import numpy as np
from bigdl.util import common
from bigdl.dataset import mnist
from pyspark.mllib.stat import Statistics

def label_img(img):
    word_label = img.split('.')[-2].split('/')[-1]
    print word_label
    # conversion to one-hot array [cat,dog]
    #                            [much cat, no dog]
    if "jobs" in word_label: return [1,0]
    #                             [no cat, very doggo]
    elif "zuckerberg" in word_label: return [0,1]

    # target is start from 0,

def get_data(sc,path):
    img_dir = path
    train = sc.binaryFiles(img_dir + "/train")
    test = sc.binaryFiles(img_dir+"/test")
    image_to_array = lambda rawdata: np.asarray(Image.open(StringIO(rawdata)))

    train_data = train.map(lambda x : (image_to_array(x[1]),np.array(label_img(x[0]))))
    test_data = test.map(lambda x : (image_to_array(x[1]),np.array(label_img(x[0]))))

    train_images = train_data.map(lambda x : x[0])
    test_images = test_data.map((lambda x : x[0]))
    train_labels = train_data.map(lambda x : x[1])
    test_labels = test_data.map(lambda x : x[1])

    training_mean = np.mean(train_images)
    training_std = np.std(train_images)
    rdd_train_images = sc.parallelize(train_images)
    rdd_train_labels = sc.parallelize(train_labels)
    rdd_test_images = sc.parallelize(test_images)
    rdd_test_labels = sc.parallelize(test_labels)

    rdd_train_sample = rdd_train_images.zip(rdd_train_labels).map(lambda (features, label):
                                        common.Sample.from_ndarray(
                                        (features - training_mean) / training_std,
                                        label + 1))
    rdd_test_sample = rdd_test_images.zip(rdd_test_labels).map(lambda (features, label):
                                        common.Sample.from_ndarray(
                                        (features - training_mean) / training_std,
                                        label + 1))

    return (rdd_train_sample, rdd_test_sample)

Now when I try to get the data using the real image as below: 现在，当我尝试使用真实图像获取数据时，如下所示：

Classification.py Classification.py

import pandas
import datetime as dt

from bigdl.nn.layer import *
from bigdl.nn.criterion import *
from bigdl.optim.optimizer import *
from bigdl.util.common import *
from bigdl.dataset.transformer import *
from bigdl.dataset import mnist
from imageUtils import get_data

from StringIO import StringIO
from PIL import Image
import numpy as np

init_engine()

path = "/home/fusemachine/Hyper/person"
(train_data, test_data) = get_data(sc,path)
print train_data.count()
print test_data.count()

I get the following error 我收到以下错误

TypeError Traceback (most recent call >last) in () （）中的TypeError追溯（最近一次调用>最近）

2 # Get and store MNIST into RDD of Sample, please edit the "mnist_path" accordingly. 2＃获取MNIST并将其存储到Sample的RDD中，请相应地编辑“ mnist_path”。

3 path = "/home/fusemachine/Hyper/person" 3路径=“ / home / fusemachine / Hyper / person”

----> 4 (train_data, test_data) = get_data(sc,path) ----> 4（train_data，test_data）= get_data（sc，path）

5 print train_data.count() 5打印train_data.count（）

6 print test_data.count() 6打印test_data.count（）

/home/fusemachine/Downloads/dist-spark-2.1.0-scala-2.11.8-linux64-0.1.1-dist/imageUtils.py in get_data(sc, path) get_data（sc，path）中的/home/fusemachine/Downloads/dist-spark-2.1.0-scala-2.11.8-linux64-0.1.1-dist/imageUtils.py

31 test_labels = test_data.map(lambda x : x[1]) 31个test_labels = test_data.map（lambda x：x [1]）

---> 33 training_mean = np.mean(train_images) ---> 33 training_mean = np.mean（train_images）

34 training_std = np.std(train_images) 34 training_std = np.std（train_images）

35 rdd_train_images = sc.parallelize(train_images) 35 rdd_train_images = sc.parallelize（train_images）

/opt/anaconda3/lib/python2.7/site-packages/numpy/core/fromnumeric.pyc in mean(a, axis, dtype, out, keepdims) /opt/anaconda3/lib/python2.7/site-packages/numpy/core/fromnumeric.pyc中的均值（a，轴，dtype，out，keepdims）

2884 pass 2884通过

2885 else: 其他2885年：

-> 2886 return mean(axis=axis, dtype=dtype, out=out, **kwargs) -> 2886返回均值（轴=轴，dtype = dtype，out = out，** kwargs）

2887 2887

2888 return _methods._mean(a, axis=axis, dtype=dtype, 2888 return _methods._mean（a，axis = axis，dtype = dtype，

TypeError: mean() got an unexpected keyword argument 'dtype' TypeError：mean（）得到了意外的关键字参数“ dtype”

I could not figure out the solution for this. 我不知道解决方案。 Also is there any other alternative of mnist dataset. mnist数据集还有其他选择。 So that we can directly process the real Image ? 这样我们就可以直接处理真实图像了？ Thank you 谢谢

Answer 1

The train_images is a rdd and you can't apply numpy mean on a rdd. train_images是一个rdd，您不能在rdd上应用numpy平均值。 one way is to do collect() and over that apply numpy mean, 一种方法是执行collect（），然后应用numpy平均值，

 train_images = train_data.map(lambda x : x[0]).collect()
 training_mean = np.mean(train_images)

or rdd.mean() 或rdd.mean（）

  training_mean = train_images.mean()

mean（）收到了意外的关键字参数'dtype'！

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-07-12 10:51:26

mean（）收到了意外的关键字参数&#39;dtype&#39;！

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-07-12 10:51:26

mean（）收到了意外的关键字参数'dtype'！

解决方案1
0 已采纳 2017-07-12 10:51:26