简体   繁体   English

在for循环中附加numpy数组的问题

[英]Issues appending numpy arrays during for loop

I'm a bit lost at the moment. 我现在有点迷路。 I correctly initialized an empty numpy array and I believe i'm using the np.append function correctly 我正确地初始化了一个空的numpy数组,我相信我正确地使用了np.append函数

Preds = np.empty(shape = (X_test.shape[0],10))

kf = KFold(n = X_train.shape[0], n_folds=10, shuffle = True)

for kf_train, kf_test in kf:

    X_train_kf = X_train.iloc[kf_train]
    Y_train_kf = Y_train.iloc[kf_train]

    dt = tree.DecisionTreeClassifier(max_depth=2)
    dt.fit(X_train_kf, Y_train_kf)
    Preds = np.append(Preds,dt.predict(X_test))

    print Preds

Just some additional info: 只是一些其他信息:

  • X_test has a shape of (9649, 24) X_test的形状为(9649,24)

  • (After running) Preds has a shape of (192980,) (运行后)Preds的形状为(192980,)

At the of this loop, Preds should have a shape of (9649,10) 在此循环的,Preds的形状应为(9649,10)

Any advice would be much appreciated. 任何建议将不胜感激。

EDIT: Here is the updated solution 编辑:这是更新的解决方案

Preds = []
kf = KFold(n = X_train.shape[0], n_folds=20, shuffle = True)

for kf_train, kf_test in kf:

    X_train_kf = X_train.iloc[kf_train]
    Y_train_kf = Y_train.iloc[kf_train]

    dt = tree.DecisionTreeClassifier(max_depth=2)
    dt.fit(X_train_kf, Y_train_kf)
    Preds.append(dt.predict(X_test))

Preds = np.vstack(Preds)

If Preds is (9649,10), then you can do one of 2 kinds of concatenation 如果Preds为(9649,10),则可以执行2种串联操作之一

 newPreds = np.concatenate((Preds, np.zeros((N,10))), axis=0)
 newPreds = np.concatenate((Preds, np.zeros((9649,N)), axis=1)

The first produces a (9649+N, 10) array, the second (9646,10+N). 第一个生成(9649 + N,10)数组,第二个生成(9646,10 + N)。

np.vstack can be use to make the 2nd array is 2d, ie it changes (10,) to (1,10) array. np.vstack可用于使第二个数组为2d,即它将(10,)更改为(1,10)数组。 np.append takes 2 arguments instead of a list, and makes sure the second is an array. np.append 2个参数而不是列表,并确保第二个是数组。 It is better for adding a scalar to a 1d array, than for general purpose concatenation. 将标量添加到1d数组要比通用级联更好。

Make sure you understand the shapes and number of dimensions of your arrays. 确保您了解阵列的形状和维数。

A good alternative is to append to a list 一个不错的选择是将其追加到列表中

alist = []
alist.append(initial_array)
for ...
    alist.append(next_array)
result = np.concatenate(alist, axis=?)
# vstack, stack, and np.array can be used if dimensions are right

Appending to list, followed by one join at the end is faster than repeated concatenates. 追加到列表,最后进行一个联接比重复的串联更快。 Lists are designed to grow cheaply; 列表旨在廉价地增长; arrays grow by making a new larger array. 阵列通过制作新的更大阵列来增长。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM