简体   繁体   中英

Add new column to Pandas Dataframe from functions' output

I wrote function to estimate parameters of simple linear regression. The function produces several outputs. Function inputs are two lists . Also, I have initial DataFrame df from where I derived two lists.

I want to add some outputs from function to the initial DataFrame as a new columns or either have new lists outside to function.

for example:

def predict(X,Y):
     beta1 = sum([(X[i] - mean_X)*(Y[i] - mean_Y) for i in range(len(X))]) / sum([(X[i] - mean_X)**2 for i in range(len(X))])
     beta0 = mean_Y - beta1 * mean_X

     y_hat = [beta0 + beta1*X[i] for i in range(len(X))]

     return df.assign(prediction = y_hat)

Here, mean_X and mean_Y is sample average for list X and list Y, respectively.

Also I tried numpy.insert() to add y_hat into not initial DataFrame but into X which I converted into numpy array.

I have no success to achieve desired result so can someone help me?

As far as I understood your question, you want to use your function in your existing/new column. If that is case, here is one way to do it. If not, then Let me know, I will remove the answer. Thanks

import pandas as pd

def Somefunction(x, y):
  a = 2 *x
  b = 3 * y
  return pd.Series([a, b], index= ['YourColumn1', 'YourColumn2'])





df = pd.read_csv('YourFile')

df = df.join(df.apply(lambda x: 
  Somefunction(x['ColumnYouWantToApplyFunctionReturnValue a'], 
  x['ColumnYouWantToApplyFunctionReturnValue B']), axis=1))

Your code doesn't seem very clear. What are the mean_X and mean_Y variables ?

EDIT : Added variable declaration.

Anyhow, here's a simple suggestion :

import numpy as np
def predict(X, Y, df):
    mean_X = np.mean(X)
    mean_Y = np.mean(Y)
    beta1 = sum([(X[i] - mean_X)*(Y[i] - mean_Y) for i in range(len(X))]) / sum([(X[i] - mean_X)**2 for i in range(len(X))])
    beta0 = mean_Y - beta1 * mean_X
    y_hat = [beta0 + beta1*X[i] for i in range(len(X))]
    df['prediction'] = y_hat
    return df

A cleverer way to proceed would be to use the apply() function called on your DataFrame.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM