简体   繁体   中英

How to convert each row in Pandas DF to 2D numpy array?

I have a Pandas dataframe with 785 columns ( 784 columns and 1 column as a label).

I wanna use CNN to train a model on it, therfore I should convert each row if the dataframe to "image": (28x28x1) (28x28=784), so a new dataframe will contain only 2 columns.

How can I can it?

The idead is on the picture: 在此处输入图像描述

Following code is an example of 4 columns and a label.

import numpy as np
import pandas as pd

csv_path = './example.xlsx'

df = pd.read_excel(csv_path)
df_new = pd.DataFrame(columns=['Column', 'label'])

for i, row in df.iterrows():
    a = []
    for j, column in row.iteritems():
        if j == 'label':
            label = column
            break
        a.append(column)
    img = np.array(a).reshape(2, 2)
    label = column
    df_new.loc[i] = [img, label]

df_new.to_csv('result.csv', index=False)

在此处输入图像描述

X_train = X_train.values.reshape(-1,28,28,1)

If you look for something like [[1,2,3,4]] , you can use the code below. If you need the array in the format of [[1,2], [3,4]] , then change the reshape method as np.reshape(arr, (2, 2)) . df is the original dataframe with 784 feature and label columns.

First, get all columns except the label:

columns = df.columns.to_list()
columns.remove('label')

Then, use the apply method to convert the df to the format you need:

def arrayize(df):
    arr = df.to_numpy()
    arr = arr.astype(int)
    arr = np.reshape(arr, (-1, 4))
    
    return arr

df["numpy_col"] = df[columns].apply(arrayize, axis=1)

I haven't run the code to test it, so you may need to check it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM