简体   繁体   中英

How does one append rows to a dataframe from a loop using Pandas?

I'm running a loop that appends values to an empty dataframe out side of the loop. However, when this is done, the datframe remains empty. I'm not sure what's going on. The goal is to find the power value that results in the lowest sum of squared residuals.

Example code below:

import tweedie

power_list = np.arange(1.3, 2, .01)
mean = 353.77
std = 17298.24
size = 860310
x = tweedie.tweedie(mu = mean, p = 1.5, phi = 50).rvs(len(x))
variance = 299228898.89

sum_ssr_df = pd.DataFrame(columns = ['power', 'dispersion', 'ssr'])

for i in power_list:

    power = i

    phi = variance/(mean**power)

    tvs = tweedie.tweedie(mu = mean, p = power, phi = phi).rvs(len(x))

    sort_tvs = np.sort(tvs)

    df = pd.DataFrame([x, sort_tvs]).transpose()
    df.columns = ['actual', 'random']
    df['residual'] = df['actual'] - df['random']
    ssr = df['residual']**2
    sum_ssr = np.sum(ssr)
    df_i = pd.DataFrame([i, phi, sum_ssr])
    df_i = df_i.transpose()
    df_i.columns = ['power', 'dispersion', 'ssr']
    sum_ssr_df.append(df_i)    

sum_ssr_df[sum_ssr_df['ssr'] == sum_ssr_df['ssr'].min()]

What exactly am I doing incorrectly?

This code isn't as efficient as is could be as noted by ALollz. When you append, it basically creates a new dataframe in memory (I'm oversimplifying here).

The error in your code is:

 sum_ssr_df.append(df_i)

should be:

 sum_ssr_df = sum_ssr_df.append(df_i)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM