pd.DataFrame from an array in 2 different ways

Question

im getting started with machine learning so i gave a try to MINST from kaggle. Im really curious about how things work so, as i couldn't find the answer online, thought it would be a great idea to make my first post in here.

i did a simple model with CNN on keras.This would be the prediction with the output from google colab.

Ynew =model.predict_classes(test_data)
Ynew.shape

(28000,)

Ynew

array([2, 0, 9, ..., 3, 9, 2])

Now i try to make a DataFrame from this and i don't really understand why i can make it one way and not the other.

This one works fine i get a table for 28000x2:

labels = ["ImageId","Label"]
col= list(range(1,28001))
submission=pd.DataFrame({"ImageId":col,"Label":Ynew})

But on this one i get everything cramped up in only 1 row:

submission2=pd.DataFrame(data=[[col,Ynew]],columns=labels)

Shouldn't both ways work the same? Hope the post wasn't so bad and thank you!!

Answer 1

submission2=pd.DataFrame(data=Ynew, index=col, columns=labels)

Answer 2

Dataframe can be created from:

dict of dict of 1D-ndarrays, lists, dicts, or Series
2-D numpy.ndarray
Structured or record ndarray
A Series
Another DataFrame

Ref: pandas-docs

In your case, Ynew is a 1D-ndarray, and loc is a list. IMHO, you can only create dataframe from dict of Ynew and loc as what you've done in the 1st method.

For second method, you need to make Ynew and loc become 2D-ndarray.

d = np.vstack([loc,Ynew]).T  # you will have (28000,2)
submission2=pd.DataFrame(data = d, columns=labels)

pd.DataFrame from an array in 2 different ways

Question

2 answers

solution1
0 2018-09-29 04:20:26

solution2
0 ACCPTED 2018-09-29 20:38:12

pd.DataFrame from an array in 2 different ways

Question

2 answers

solution1 0 2018-09-29 04:20:26

solution2 0 ACCPTED 2018-09-29 20:38:12

solution1
0 2018-09-29 04:20:26

solution2
0 ACCPTED 2018-09-29 20:38:12