简体   繁体   English

如何在Python中为2个不同的数据集使用分类器随机森林?

[英]How to use classifier random forest in Python for 2 different data sets?

I have 2 data sets with different variables. 我有2个具有不同变量的数据集。 But both includes a variable, say NUM, that helps to identify the occurrence of an event. 但是两者都包含一个变量,例如NUM,可以帮助识别事件的发生。 With the NUM, I was able to identify the event, by labelling it. 使用NUM,我可以通过标记事件来识别事件。 How can one run RF to effectively include considerations of the 2 datasets? 如何运行RF以有效地包括对两个数据集的考虑? I am not able to append them (column wise) as the number of records for each NUM differs. 由于每个NUM的记录数不同,因此我无法添加它们(以列为单位)。

From the way your question is phrased, I'm guessing you have two pandas dataframes. 从您的问题的表达方式来看,我猜您有两个熊猫数据框。

You can use pandas.merge to pull the two together. 您可以使用pandas.merge将两者拉在一起。 All you need to do is a join of some sort. 您需要做的只是某种形式的联接。 Left might be what you're looking for, but if you want to only pull data where you have a NUM value in both dataframes, use an inner join. 左边可能是您要寻找的内容,但是如果您只想在两个数据框中都具有NUM值的地方提取数据,请使用内部联接。

See the documentation here: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html 请参阅此处的文档: https : //pandas.pydata.org/pandas-docs/stable/genic/pandas.DataFrame.merge.html

Here's how that might look: 这可能是这样的:

pd.merge(df1,df2,how='left',left_on='NUM')

You could try to put NUM as a single column, and the first and second datasets would use completely independent columns, with the non-matching cells containing empty data. 您可以尝试将NUM放在单个列中,并且第一和第二个数据集将使用完全独立的列,并且不匹配的单元格包含空数据。 Whether the results will be any good, will depend much on your data. 结果是否良好,将在很大程度上取决于您的数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将图像(多维数组)数据拟合到 python 中的随机森林分类器中? - How to fit image (multidimensional array) data into a random forest classifier in python? 如何在Python中为随机森林分类器设置0.8的阈值 - How to put an threshold of .8 to Random forest classifier In Python 分类数据的随机森林分类器? - Random Forest Classifier for Categorical Data? Sklearn - 无法在随机林分类器中使用编码数据 - Sklearn - Cannot use encoded data in Random forest classifier 如何为随机森林分类器,Ada Boost分类器,Extra Trees分类器访问python scikit学习代码 - how to access the python scikit learning code for Random Forest Classifier, Ada Boost Classifier, Extra Trees Classifier 如何将数据输入随机森林分类器并查看预测 - How to feed data into random forest classifier and see prediction 随机森林分类器Matlab v / s Python - Random Forest Classifier Matlab v/s Python 如何测量随机森林分类器的准确性? - How to measure Random Forest classifier accuracy? 如何在随机森林分类器中找到最大深度? - How to find the max depth in a random forest classifier? 如何使用随机森林使用python进行图像分割 - how to use random forest for image segmentation with python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM