简体   繁体   中英

What is model in data mining?

  1. I want to know, what direct is MODEL in data mining? Can anyone explain that?

  2. When I use Weka, I take my data, choose method and generate MODEL by clicking Start button. Can anyone explain what is behind this model and how model works after I generated it. It uses my chosen method for example to classify example?

Please can someone explain these things?

The model simply describes the information that is used when trying to deal with new data. In a simple spam detection scenario the algorithm determines which words seem to point to spam and which don't by looking at annotated emails. The lists of words then form your model.

When receiving new email you won't compare them with other real emails, instead you will consider the new email's words and check your model (word lists) whether they seem to indicate a spam mail or not. You see, that you become independent from your training data, instead you have a piece of knowdledge that tries to model the whole "spam vs. non-spam"-reality.

Suppose there are only the following variables related to music: guitar solos (has/hasn't), sudden tone changes (has/hasn't), vocal (has/hasn't, male/female), drums (has/hasn't, regular/electronic).

Now, let's suppose that you enjoy music when it has guitar solos, has sudden tone changes, has female vocals, and electronic drums. On the other had, I appreciate music when it has guitar solos, has sudden tone changes, has no vocals, and has regular drums).

Those preferences can be thought of as our models for enjoying music.

Now, suppose there's a song which has guitar solos, sudden tone changes, female vocals, and electronic drums. If we were to tell if you enjoy or not this song, the answer would be yes, that's a 100% match. But what about me? Well, I appreciate 3 of the 5 features of the song, so I'd likely enjoy it.

The answer we gave above about appreciating or not the song can be regarded as a classification task in machine learning. Now, if we had to group everyone regarding musical preferences and the music features above, we'd be clustering the music listeners, and so on.

How do we build a model for something? Of course, from data. When you're working with Weka, your .arff files contain your training data, which Weka uses to learn about the thing depicted by those data (in our example, it would learn our musical preferences).

The learning process generates a model, which is used to classify new data, group them, etc. For instance, if we provided Weka with our music preferences and instructed it to learn our models with a Bayesian classifier, when we provide it with the features of a given song, it will be able to tell if we'd like or not that song, and within what probability.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM