简体   繁体   English

熊猫将功能列表应用于数据框

[英]pandas apply list of function to data frame

Lets take boston data set available in the from sklearn.datasets import load_boston 让我们from sklearn.datasets import load_boston获取可用的波士顿数据集

boston = load_boston()
X = pd.DataFrame(boston["data"])

           0     1      2    3      4      5      6       7     8      9     10      11     12
0     0.00632  18.0   2.31  0.0  0.538  6.575   65.2  4.0900   1.0  296.0  15.3  396.90   4.98
1     0.02731   0.0   7.07  0.0  0.469  6.421   78.9  4.9671   2.0  242.0  17.8  396.90   9.14
2     0.02729   0.0   7.07  0.0  0.469  7.185   61.1  4.9671   2.0  242.0  17.8  392.83   4.03
3     0.03237   0.0   2.18  0.0  0.458  6.998   45.8  6.0622   3.0  222.0  18.7  394.63   2.94
4     0.06905   0.0   2.18  0.0  0.458  7.147   54.2  6.0622   3.0  222.0  18.7  396.90   5.33
5     0.02985   0.0   2.18  0.0  0.458  6.430   58.7  6.0622   3.0  222.0  18.7  394.12   5.21
6     0.08829  12.5   7.87  0.0  0.524  6.012   66.6  5.5605   5.0  311.0  15.2  395.60  12.43

I have built a machine learning model (RF) and have obtained all estimators in the model. 我建立了机器学习模型(RF),并获得了模型中的所有估计量。

estimators = model.estimators_

You can think this has list of functions that takes row level data and return a value. 您可以认为这具有获取行级数据并返回值的函数列表。

>> estimators = model.estimators_
>> estimators
[DecisionTreeRegressor(criterion='mse', max_depth=60, max_features=8,
           max_leaf_nodes=None, min_impurity_decrease=0.0,
           min_impurity_split=None, min_samples_leaf=5,
           min_samples_split=12, min_weight_fraction_leaf=0.0,
           presort=False, random_state=1838148368, splitter='best'), DecisionTreeRegressor(criterion='mse', max_depth=60, max_features=8,
           max_leaf_nodes=None, min_impurity_decrease=0.0,
           min_impurity_split=None, min_samples_leaf=5,
           min_samples_split=12, min_weight_fraction_leaf=0.0,
           presort=False, random_state=1754873550, splitter='best'), DecisionTreeRegressor(criterion='mse', max_depth=60, max_features=8,
           max_leaf_nodes=None, min_impurity_decrease=0.0,....]

I want each estimator/function in list to be apply to every row in the data frame. 我希望列表中的每个估计器/函数都适用于数据框中的每一行。

If I don't convert the data to data frame boston['data'] returns a 2D Array. 如果我不将数据转换为数据帧,则boston['data']将返回2D数组。 I can use two for loops to accomplish above. 我可以使用两个for loops来完成上述操作。 Assume X is a 2D array then I can do following 假设X是2D数组,那么我可以执行以下操作

for x in range(len(X)):
    vals = []
    for estimator in model.estimators_:
        vals.append(estimator.predict(X[x])[0])

I don't want to use 2D array option because I want to keep the index information of the DataFrame for future operations. 我不想使用2D数组选项,因为我想保留DataFrame的索引信息以备将来使用。

In the latest version of pandas , df.agg should be able to do exactly this. 在最新版本的pandasdf.agg应该能够做到这一点。

Unfortunately it appears to be broken for the current version when axis=1 : https://github.com/pandas-dev/pandas/issues/16679 不幸的是,当axis=1时,当前版本似乎已损坏: https//github.com/pandas-dev/pandas/issues/16679

Here's a hacky way around it: 这是一种解决方法:

X.T.agg(estimators).T

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM