简体   繁体   English

Pandas 手动计算平均值或标准差

[英]Pandas calculate manually for mean or standard deviation

Apart from the min and max value, we also want the mean and variance of the Kilometers_Driven in different location.除了最小值和最大值之外,我们还需要 Kilometers_Driven 在不同位置的均值和方差。 Using the iterative method, do the following:使用迭代方法,执行以下操作:

Find all of the unique location in the dataset.查找数据集中的所有唯一位置。

Start the timer.启动计时器。

For a unique location, iterate through the dataset once to calculate the mean of the Kilometers_Driven.对于唯一位置,遍历数据集一次以计算 Kilometers_Driven 的平均值。

For the same unique location, iterate through the dataset once more to calculate the variance of the Kilometers_Driven.对于相同的唯一位置,再次遍历数据集以计算 Kilometers_Driven 的方差。

8.5 Repeat for all of the unique locations. 8.5 对所有独特的位置重复。 Iteratively, calculate the mean and variance of the Kilometers_Driven for different location.迭代地,计算不同位置的 Kilometers_Driven 的均值和方差。 Measure the time it takes.测量所需的时间。

Stop the timer.停止计时器。 Print out the mean and variance of the Kilometers_Driven for each location as well as the time elapsed.打印出每个位置的 Kilometers_Driven 的均值和方差以及经过的时间。

My codes are below:我的代码如下:

#8.1
df.Location.unique()

#8.2
start = timeit.default_timer()

#8.3 Calculating mean of "Kilometers_Driven" manually (For a unique location?)
col_mean = 0.0
for row in range(len(df)):
    col_mean += df.loc[row, "Kilometers_Driven"]
col_mean /= len(df)
print(col_mean)

#8.4 Calculating variance of "Kilometers_Driven" manually
col_var = 0.0
for row in range(len(df)):
    col_var += (df.loc[row, "Kilometers_Driven"] - col_mean)**2
col_var /= len(df) - 1 
print(col_var)

#8.5 How to do?

#8.6 Setting Stop Time
stop = timeit.default_timer()

t_custom = stop - start

print(f"Time elapsed {t_custom} s")

it works but For a unique location, iterate through the dataset once to calculate the mean of the Kilometers_Driven.它有效,但对于一个独特的位置,遍历数据集一次以计算 Kilometers_Driven 的平均值。 in 8.3.在 8.3 中。 I just Calculating mean of "Kilometers_Driven" manually.I am not sure how to correct it.我只是手动计算“Kilometers_Driven”的平均值。我不知道如何纠正它。 And not sure how to do question in 8.5.并且不确定如何在 8.5 中提问。 Anyone can help me?任何人都可以帮助我吗? Thanks in advance!!提前致谢!!

for l in list(df.Location.unique()): 
    col_mean = 0.0

    num_rows=0
    for row in range(len(df)):
        if df.loc[row, 'Location'] == l:
              num_rows += 1
              col_mean += df.loc[row, "Kilometers_Driven"]
    col_mean = col_mean/num_rows

    print( 'Location: %s mean %.2f' % (l, col_mean) )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM