简体   繁体   中英

How to train a CNN on an unlabeled dataset?

I want to train a CNN on my unlabeled data, and from what I read on Keras/Kaggle/TF documentation or Reddit threads, it looks like I will have to label my dataset beforehand. Is there a way to train the CNN in an unsupervised way?
I cannot understand how to initialize and (where y_train and y_test represent usual meanings) (其中y_train和y_test表示通常的含义)
The information about my dataset is as follows:

  1. I have 50,000 matrices of dimension 30 x 30.
  2. Each matrix is divided into 9 subareas (for understanding, as separated by the vertical and horizontal bars).
  3. A subarea is said to be if it has at least one element equal to 1. If all elements for that subarea are equal to 0, the subarea is . ,如果它具有等于1的至少一种元素。如果该分区的所有元素都等于0,则分区是
  4. For the first example shown below, I should get as output the names of subareas that are active, so here, (1, 4, 5, 6, 7, 9).
  5. If no subarea is active, as in the second example, the output should be 0.

First example: Output - (1, 4, 5, 6, 7, 9) 第一个示例图片

Second example: Output - 0 第二个示例图片 After creating these matrices, I did the following:

  1. I put these matrices in a CSV file after reshaping them into vectors of dimension 900 x 1.
  2. So basically, each row in the CSV contains 900 columns with values either 0 or 1.
  3. The classes for my classification problem are numbers from 0-9 where 0 represents the class where no label has an active (value=1) value.

For my model, I want the following:

  • a 900 x 1 vector as described above. 如上所述的900 x 1向量。
  • one of the values from 0-9, 0-9中的值之一,
    where 1-9 represent the active subareas, and 0 represents no active subarea.


I am able to retrieve the data from the CSV file into a data frame and split the data frame into and . But I am unable to understand how to set my and values. 值。
My problem seems very similar to the MNIST dataset, except I don't have the labels. Would it be possible for me to train the model without the labels?

My code currently looks like this:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

# Read the dataset from the CSV file into a dataframe
df = pd.read_csv("bci_dataset.csv")

# Split the dataframe into training and test dataset
train, test = train_test_split(df, test_size=0.2)

x_train = train.iloc[:, :]
x_test = test.iloc[:, :]

print(x_train.shape)
print(x_test.shape)

Thank you, in advance, for reading this whole thing and helping me out!

Can you tell us why you want to use a CNN specifically? Generally neural networks are used when there's some complication involved in going from feature to output - the artificial neurons are able to learn different behavior as a result of being exposed to the ground truth (ie, the labels). Most of the time, the researcher using the neural network doesn't even know what features of the input data are being used by the network to come to its output conclusions.

In the case you have given us, it looks a little bit more like you know what features are important (that is, the sum of a subarea has to be greater than 0 in order to be active). The neural network wouldn't need to really learn anything in particular to do its job. Although it doesn't seem necessary to use a neural network for this process, it does make sense for you to automate it, given the size of your input data! :)

Let me know if I'm misunderstanding your situation, though?

Edit: To contrast this with the MNIST dataset - so for identifying handwritten digits, there's some ambiguity that the network has to learn to deal with. Not every kind of handwriting is going to render a 7 the same way. A neural network is able to figure out a couple of the features of a 7 (ie, there is a high probability that a 7 will have a diagonal line going from top-right-to-bottom-left, which, depending on how you write, could be slightly curved or offset or whatever), as well as a couple of different versions of a 7 (some people do a horizontal slash through the middle of it, other versions of a 7 don't have that slash). The utility of a neural network here is in figuring out all that ambiguity and probabilistically classifying an input as a 7 (because it has seen previous images that it "knows" are 7s). However, in your case, there's only one way for your answer to be rendered - if there's any element greater than 0 in a subarea, it's active! So you don't need to train a network to do anything - you will just need to write some code that automates the summing of the subareas.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM