How to use classifier random forest in Python for 2 different data sets?

Question

I have 2 data sets with different variables. But both includes a variable, say NUM, that helps to identify the occurrence of an event. With the NUM, I was able to identify the event, by labelling it. How can one run RF to effectively include considerations of the 2 datasets? I am not able to append them (column wise) as the number of records for each NUM differs.

Answer 1

From the way your question is phrased, I'm guessing you have two pandas dataframes.

You can use pandas.merge to pull the two together. All you need to do is a join of some sort. Left might be what you're looking for, but if you want to only pull data where you have a NUM value in both dataframes, use an inner join.

See the documentation here: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html

Here's how that might look:

pd.merge(df1,df2,how='left',left_on='NUM')

Answer 2

You could try to put NUM as a single column, and the first and second datasets would use completely independent columns, with the non-matching cells containing empty data. Whether the results will be any good, will depend much on your data.

How to use classifier random forest in Python for 2 different data sets?

Question

2 answers

solution1
1 2018-05-22 15:21:54

solution2
0 2018-05-23 11:39:03

How to use classifier random forest in Python for 2 different data sets?

Question

2 answers

solution1 1 2018-05-22 15:21:54

solution2 0 2018-05-23 11:39:03

solution1
1 2018-05-22 15:21:54

solution2
0 2018-05-23 11:39:03