简体   繁体   中英

How to use machine learning model to new data?

I am new to this Data Science field. I have a question to apply Random forest to new data.

I have this table.

Y prop_A prop_B
A   0.8    0.2
A   0.7    0.3
B   0.5    0.5
B   0.4    0.6
B   0.1    0.9

I assumed that if the proportion of the group is high, chances are high that it is in the group. I built a model using random forest and test it with validation set (8/2 splits).

I thought the above model can be used for new data. This is an example of the data. The data structure and variable meaning is same, but the number of variable is different.

Y prop_C prop_D prop_E prop_F
-   0.8    0.1   0.05   0.05
-   0.6    0.3   0.05   0.05
-   0.5    0.4   0.05   0.05
-   0.4    0.2   0.4     0
-   0.1    0.5   0.4    0.4

The new data is unlabeled so I would like to make a label using the Random forest I used with previous data. Is it right approach to label the new data?

In the model, it doesn't works (due to different independent variables).

How should I do to label the new data based on a model using labelled data, which is different?

The no of independent variables and variables should be same. if you want give a try just omit (prop_E and Prop_F) and rename (prop_C and Prop_D) as (prop_A and Prop_B) it will work

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM