简体   繁体   English

如何使用Pandas从循环向数据框追加行?

[英]How does one append rows to a dataframe from a loop using Pandas?

I'm running a loop that appends values to an empty dataframe out side of the loop. 我正在运行一个循环,将值附加到循环的空数据框外。 However, when this is done, the datframe remains empty. 但是,完成此操作后,数据帧仍为空。 I'm not sure what's going on. 我不确定发生了什么。 The goal is to find the power value that results in the lowest sum of squared residuals. 目标是找到导致残差平方和最小的功率值。

Example code below: 示例代码如下:

import tweedie

power_list = np.arange(1.3, 2, .01)
mean = 353.77
std = 17298.24
size = 860310
x = tweedie.tweedie(mu = mean, p = 1.5, phi = 50).rvs(len(x))
variance = 299228898.89

sum_ssr_df = pd.DataFrame(columns = ['power', 'dispersion', 'ssr'])

for i in power_list:

    power = i

    phi = variance/(mean**power)

    tvs = tweedie.tweedie(mu = mean, p = power, phi = phi).rvs(len(x))

    sort_tvs = np.sort(tvs)

    df = pd.DataFrame([x, sort_tvs]).transpose()
    df.columns = ['actual', 'random']
    df['residual'] = df['actual'] - df['random']
    ssr = df['residual']**2
    sum_ssr = np.sum(ssr)
    df_i = pd.DataFrame([i, phi, sum_ssr])
    df_i = df_i.transpose()
    df_i.columns = ['power', 'dispersion', 'ssr']
    sum_ssr_df.append(df_i)    

sum_ssr_df[sum_ssr_df['ssr'] == sum_ssr_df['ssr'].min()]

What exactly am I doing incorrectly? 我究竟做错了什么?

This code isn't as efficient as is could be as noted by ALollz. 这段代码的效率不如ALollz所说的那么高效。 When you append, it basically creates a new dataframe in memory (I'm oversimplifying here). 当你追加时,它基本上会在内存中创建一个新的数据帧(我在这里过于简化了)。

The error in your code is: 您的代码中的错误是:

 sum_ssr_df.append(df_i)

should be: 应该:

 sum_ssr_df = sum_ssr_df.append(df_i)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM