简体   繁体   中英

How to use classifier random forest in Python for 2 different data sets?

I have 2 data sets with different variables. But both includes a variable, say NUM, that helps to identify the occurrence of an event. With the NUM, I was able to identify the event, by labelling it. How can one run RF to effectively include considerations of the 2 datasets? I am not able to append them (column wise) as the number of records for each NUM differs.

From the way your question is phrased, I'm guessing you have two pandas dataframes.

You can use pandas.merge to pull the two together. All you need to do is a join of some sort. Left might be what you're looking for, but if you want to only pull data where you have a NUM value in both dataframes, use an inner join.

See the documentation here: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html

Here's how that might look:

pd.merge(df1,df2,how='left',left_on='NUM')

You could try to put NUM as a single column, and the first and second datasets would use completely independent columns, with the non-matching cells containing empty data. Whether the results will be any good, will depend much on your data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM