通过将sklearn.predict传递给df.apply，对Pandas数据帧进行逐行预测

Question

Assuming we have a Pandas dataframe and a scikit-learn model, trained (fit) using that dataframe. 假设我们有一个Pandas数据框和一个scikit-learn模型，并使用该数据框进行了训练（拟合）。 Is there a way to do row-wise prediction? 有办法进行行预测吗？ The use case is to use the predict function to fill in empty values in the dataframe, using an sklearn model. 用例是使用sklearn模型使用预测函数在数据框中填充空值。

I expected that this would be possible using the pandas apply function (with axis=1), but I keep getting dimensionality errors. 我希望使用pandas apply函数（轴= 1）能够做到这一点，但是我一直在遇到尺寸错误。

Using Pandas version '0.22.0' and sklearn version '0.19.1'. 使用Pandas版本'0.22.0'和sklearn版本'0.19.1'。

Simple example: 简单的例子：

import pandas as pd
from sklearn.cluster import kmeans

data = [[x,y,x*y] for x in range(1,10) for y in range(10,15)]

df = pd.DataFrame(data,columns=['input1','input2','output'])

model = kmeans()
model.fit(df[['input1','input2']],df['output'])

df['predictions'] = df[['input1','input2']].apply(model.predict,axis=1)

The resulting dimensionality error: 产生的尺寸误差：

ValueError: ('Expected 2D array, got 1D array instead:\narray=[ 1. 
10.].\nReshape your data either using array.reshape(-1, 1) if your data has 
a single feature or array.reshape(1, -1) if it contains a single sample.', 
'occurred at index 0')

Running predict on the whole column works fine: 在整个列上运行预测工作正常：

df['predictions'] = model.predict(df[['input1','input2']])

However, I want the flexibility to use this row-wise. 但是，我希望可以灵活地逐行使用。

I've tried various approaches to reshape the data first, for example: 我尝试了多种方法来重塑数据，例如：

def reshape_predict(df):
    return model.predict(np.reshape(df.values,(1,-1)))

df[['input1','input2']].apply(reshape_predict,axis=1)

Which just returns the input with no error, whereas I expect it to return a single column of output values (as an array). 它只返回没有错误的输入，而我希望它返回一列输出值（作为数组）。

SOLUTION: 解：

Thanks to Yakym for providing a working solution! 感谢Yakym提供了可行的解决方案！ Trying a few variants based on his suggestion, the easiest solution was to simply wrap the row values in square brackets (I tried this previously, but without the 0 index for the prediction, with no luck). 根据他的建议尝试一些变体，最简单的解决方案是将行值包装在方括号中（我之前曾尝试过，但没有0的预测索引，没有运气）。

df['predictions'] = df[['input1','input2']].apply(lambda x: model.predict([x])[0],axis=1)

Answer 1

Slightly more verbose, you can turn each row into 2D array by adding new a new axis to the values. 稍微冗长些，您可以通过向值添加新的新轴来将每一行变成2D数组。 You will then have to access the prediction with 0 index: 然后，您将必须使用0索引访问预测：

df["predictions"] = df[["input1", "input2"]].apply(
    lambda s: model.predict(s.values[None])[0], axis=1
)

通过将sklearn.predict传递给df.apply，对Pandas数据帧进行逐行预测

问题描述

1 个解决方案

解决方案1
0 2018-06-09 07:56:19

通过将sklearn.predict传递给df.apply，对Pandas数据帧进行逐行预测

问题描述

1 个解决方案

解决方案1 0 2018-06-09 07:56:19

解决方案1
0 2018-06-09 07:56:19