从函数输出向Pandas Dataframe添加新列

Question

I wrote function to estimate parameters of simple linear regression. 我写了函数来估计简单线性回归的参数。 The function produces several outputs. 该功能产生多个输出。 Function inputs are two lists . 功能输入是两个列表。 Also, I have initial DataFrame df from where I derived two lists. 此外，我有从我导出两个列表的地方的初始DataFrame df。

I want to add some outputs from function to the initial DataFrame as a new columns or either have new lists outside to function. 我想将函数中的一些输出作为新列添加到初始DataFrame中，或者在函数外部添加新列表。

for example: 例如：

def predict(X,Y):
     beta1 = sum([(X[i] - mean_X)*(Y[i] - mean_Y) for i in range(len(X))]) / sum([(X[i] - mean_X)**2 for i in range(len(X))])
     beta0 = mean_Y - beta1 * mean_X

     y_hat = [beta0 + beta1*X[i] for i in range(len(X))]

     return df.assign(prediction = y_hat)

Here, mean_X and mean_Y is sample average for list X and list Y, respectively. 这里，mean_X和mean_Y分别是列表X和列表Y的样本平均值。

Also I tried numpy.insert() to add y_hat into not initial DataFrame but into X which I converted into numpy array. 我还尝试了numpy.insert（）将y_hat添加到非初始DataFrame中，但添加到X中，我将其转换为numpy数组。

I have no success to achieve desired result so can someone help me? 我没有成功达到预期的效果，所以有人可以帮助我吗？

Answer 1

As far as I understood your question, you want to use your function in your existing/new column. 据我所知，你想在现有/新专栏中使用你的功能。 If that is case, here is one way to do it. 如果是这种情况，这是一种方法。 If not, then Let me know, I will remove the answer. 如果没有，那么让我知道，我会删除答案。 Thanks 谢谢

import pandas as pd

def Somefunction(x, y):
  a = 2 *x
  b = 3 * y
  return pd.Series([a, b], index= ['YourColumn1', 'YourColumn2'])





df = pd.read_csv('YourFile')

df = df.join(df.apply(lambda x: 
  Somefunction(x['ColumnYouWantToApplyFunctionReturnValue a'], 
  x['ColumnYouWantToApplyFunctionReturnValue B']), axis=1))

Answer 2

Your code doesn't seem very clear. 你的代码似乎不太清楚。 What are the mean_X and mean_Y variables ? mean_X和mean_Y变量是什么？

EDIT : Added variable declaration. 编辑：添加变量声明。

Anyhow, here's a simple suggestion : 无论如何，这是一个简单的建议：

import numpy as np
def predict(X, Y, df):
    mean_X = np.mean(X)
    mean_Y = np.mean(Y)
    beta1 = sum([(X[i] - mean_X)*(Y[i] - mean_Y) for i in range(len(X))]) / sum([(X[i] - mean_X)**2 for i in range(len(X))])
    beta0 = mean_Y - beta1 * mean_X
    y_hat = [beta0 + beta1*X[i] for i in range(len(X))]
    df['prediction'] = y_hat
    return df

A cleverer way to proceed would be to use the apply() function called on your DataFrame. 一种更聪明的方法是使用在DataFrame上调用的apply（）函数。

从函数输出向Pandas Dataframe添加新列

问题描述

2 个解决方案

解决方案1
2 2018-06-01 12:44:44

解决方案2
1 已采纳 2018-06-01 12:24:14

从函数输出向Pandas Dataframe添加新列

问题描述

2 个解决方案

解决方案1 2 2018-06-01 12:44:44

解决方案2 1 已采纳 2018-06-01 12:24:14

解决方案1
2 2018-06-01 12:44:44

解决方案2
1 已采纳 2018-06-01 12:24:14