简体   繁体   中英

Tensorflow Keras Model not learning and accuracy extremly low

I am new to ML, and I'm trying to build a model to predict a product ID from putting in another product ID from two different product categories. The data looks like this:

Product_A Product_B
14432 91342
14463 2344

I have tried to one-hot-encode labels and features, but the model is not learning at all. It should look like this: You give product1 in (0 0 0 1 0 0 0..) and product2 as a label (0 1 0 0 0...).

The.net had as many neurons as the category of product1 had, and as many neurons on the output as the second product category has. For example: T-shirt (200 products) Beanies (500 products). So to predict the beanies. I thought I should use 200 Input neurons and 500 output neurons.

That's my code:

def import_data(url):
    dataframe = pd.read_csv(url)
    return dataframe


def prepare_df(dataframe):
    split_data = dataframe['product_ids'].str.split(',', n=1, expand=True)
    split_data = split_data.rename(columns={0: 'Beanie', 1: 'Shirt'})
    split_data = split_data.dropna()
    split_data['Beanie'] = pd.to_numeric(split_data["Beanie"]).astype('category').cat.codes
    split_data['Shirt'] = pd.to_numeric(split_data['Shirt']).astype('category').cat.codes


    return split_data


def create_model():
    my_model = Sequential()
    my_model.add(Input(shape=(127,)))
    my_model.add(Dense(127, activation='sigmoid'))
    my_model.add(Dense(127, activation='sigmoid'))
    my_model.add(Dense(607, activation='sigmoid'))

    return my_model


if __name__ == '__main__':
    data = import_data('data/_Data_Beanies_Shirts_2.csv')

    data = prepare_df(data)

    X = data['Shirt']
    y = data['Beanie']

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1, shuffle=True)

    X_train = to_categorical(X_train)
    X_test = to_categorical(X_test)
    y_train = to_categorical(y_train)
    y_test = to_categorical(y_test)

    optimizer = Adam(learning_rate=0.001)

    model = create_model()

    model.compile(
        loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy']
    )
    print("Fit model on training data")
    history = model.fit(
        x=X_train, y=y_train, epochs=100, verbose=1,
        validation_data=(X_test, y_test))

Do I maybe need something like Embeddings?

Would be great, if someone could help me with this. Any help is greatly appreciated! Thank you!

For one thing, I don't think you can predict IDs. This doesn't make a lot of sense. It's kind of like trying to predict names of people. For another thing, try other models. Different models, or algos, learn differently, given the data that's fed into them. Try each of these models...you will learn a lot!!

Linear Regression
Logistic Regression
Decision Tree
SVM
Naive Bayes
kNN
K-Means
Random Forest
Dimensionality Reduction Algorithms
Gradient Boosting algorithms
    GBM
    XGBoost
    LightGBM
    CatBoost

See the links below for some ideas of how to get started.

https://scikit-learn.org/stable/supervised_learning.html

https://www.analyticsvidhya.com/blog/2017/09/common-machine-learning-algorithms/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM