简体   繁体   中英

How to organize data for Mutllevel modeling - Decision Tree, Classification, or Regression

I have three tables - Sales Manager, Customer, and Order. Each sales manager has multiple customers, and each customer can have multiple orders.

I am interested in determining if certain attributes of sales manager and attributes of customer will lead to sales of a particular product (Let's say Product A Yes/no).

Suppose I have 3 sales managers, 10 customers, and 20 orders.

Should I structure the data set to have 3 rows, 10 rows or 20 rows. Please advise.

Also, will the decision tree, and classification algorithm automatically understand the hierarchical relationships among manager, customer and order?

Thanks.

I think you should make one big feature matrix out of it. Suppose you have tables

Sales Manager (id attr_1 ... attr_m)
Customer (id attr_1 ... attr_n sales_manager_id)
Order (id product_id_1 ... product_id_l customer_id)

Then it is most probably reasonable to create the matrix in the following form

Matrix:
product_id order_attr_1 ... order_attr_l customer_attr_1 ... customer_attr_n ... manager_attr_1 ... manager_attr_m

Now you have 20*l row matrix with all the attributes that are given for certain order.

In the simplest form you can use the following matrix for classification. In case of too many attributes maybe it is reasonable to use PCA first. Maybe you should try to use Weka and see, what turns out.

Considering your question about the hierarchical relations, then the classification algorithms will not understand them explicitly.
I would recommend this book here: Introduction to Data Mining , as it answers most of your questions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM