简体   繁体   中英

pd.DataFrame from an array in 2 different ways

im getting started with machine learning so i gave a try to MINST from kaggle. Im really curious about how things work so, as i couldn't find the answer online, thought it would be a great idea to make my first post in here.

i did a simple model with CNN on keras.This would be the prediction with the output from google colab.

Ynew =model.predict_classes(test_data)
Ynew.shape

(28000,)

Ynew

array([2, 0, 9, ..., 3, 9, 2])

Now i try to make a DataFrame from this and i don't really understand why i can make it one way and not the other.

This one works fine i get a table for 28000x2:

labels = ["ImageId","Label"]
col= list(range(1,28001))
submission=pd.DataFrame({"ImageId":col,"Label":Ynew})

But on this one i get everything cramped up in only 1 row:

submission2=pd.DataFrame(data=[[col,Ynew]],columns=labels)

Shouldn't both ways work the same? Hope the post wasn't so bad and thank you!!

submission2=pd.DataFrame(data=Ynew, index=col, columns=labels)

Dataframe can be created from:

  1. dict of dict of 1D-ndarrays, lists, dicts, or Series
  2. 2-D numpy.ndarray
  3. Structured or record ndarray
  4. A Series
  5. Another DataFrame

Ref: pandas-docs

In your case, Ynew is a 1D-ndarray, and loc is a list. IMHO, you can only create dataframe from dict of Ynew and loc as what you've done in the 1st method.

For second method, you need to make Ynew and loc become 2D-ndarray.

d = np.vstack([loc,Ynew]).T  # you will have (28000,2)
submission2=pd.DataFrame(data = d, columns=labels)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM