简体   繁体   English

如何在 Python 中使用 enumerate 来计算列表的 STD?

[英]How do I use enumerate in Python to compute STD of a list?

I'm trying to compute the standard deviation of a list vr .我正在尝试计算列表vr的标准偏差。 The list size is 32, containing an array of size 3980. This array represents a value at a given height (3980 heights).列表大小为 32,包含大小为 3980 的数组。此数组表示给定height (3980 高度)处的值。

First I split the data into 15 minute chunks, where the minutes are given in raytimes .首先,我将数据分成 15 分钟的块,其中的分钟以raytimes给出。 raytimes is a list of size 32 as well (containing just the time of the observation, vr ). raytimes也是一个大小为32的列表(仅包含观察时间, vr )。

I want the standard deviation computed at each height level, such that I end up with one final array of size 3980 .我想要在每个height级别计算标准偏差,这样我最终会得到一个大小为3980最终数组。 This happens OK in my code.这发生在我的代码中。 However my code does not produce the correct standard deviation value when I test it — that is to say the values that are output to w1sd , w2sd etc, are not correct (however the array is the correct size: an array of 3980 elements).然而,我的代码在测试时没有产生正确的标准偏差值——也就是说,输出到w1sdw2sd等的值不正确(但是数组的大小正确:一个包含3980元素的数组)。 I assume I am mixing up the wrong indices when computing the standard deviation.我假设在计算标准偏差时我混淆了错误的索引。

Below are example values from the dataset.以下是数据集中的示例值。 All data should fall into w1 and w1sd as the raytimes provided in this example are all within 15 minutes (< 0.25).所有数据都应该属于w1w1sd因为本示例中提供的raytimes都在 15 分钟内 (< 0.25)。 I want to compute the standard deviation of the first element of vr , that is, the standard deviation of 2.0 + 3.1 + 2.1 , then the second element, or standard deviation of 3.1 + 4.1 + nan etc. The result for w1sd SHOULD BE [0.497, 0.499, 1.0, 7.5] but instead the code as below gives a nanstd in w1sd = [0.497, 0.77, 1.31, 5.301] .我想计算vr的第一个元素的标准偏差,即2.0 + 3.1 + 2.1的标准偏差,然后是第二个元素,或标准偏差3.1 + 4.1 + nan等。 w1sd的结果应该是[0.497, 0.499, 1.0, 7.5]但是下面的代码在w1sd = [0.497, 0.77, 1.31, 5.301]中给出了一个nanstd Is it something wrong with nanstd or my indexing? nanstd或我的索引有问题吗?

vr = [
    [2.0, 3.1, 4.1, nan],
    [3.1, 4.1, nan, 5.1],
    [2.1, nan, 6.1, 20.1]
]
Height = [10.0, 20.0, 30.0, 40]
raytimes = [0, 0.1, 0.2]

for j, h in enumerate(Height): 
    for i, t in enumerate(raytimes):
        if raytimes[i] < 0.25:
            w1.append(float(vr[i][j]))
        elif 0.25 <= raytimes[i] < 0.5:
            w2.append(float(vr[i][j]))
        elif 0.5 <= raytimes[i] < 0.75:
            w3.append(float(vr[i][j]))
        else:
            w4.append(float(vr[i][j]))
    w1sd.append(round(nanstd(w1), 3))
    w2sd.append(round(nanstd(w2), 3))
    w3sd.append(round(nanstd(w3), 3))
    w4sd.append(round(nanstd(w4), 3))
    w1 = []
    w2 = []
    w3 = []
    w4 = []

I would consider using pandas for this.我会考虑为此使用pandas It is a library that allows for efficient processing of datasets in numpy arrays and takes all the looping and indexing out of your hands.它是一个库,允许高效处理numpy数组中的数据集,并让您numpy进行所有循环和索引。

In this case I would define a dataframe with N_raytimes rows and N_Height columns, which would allow to easily slice and aggregate the data any way you like.在这种情况下,我将定义一个包含N_raytimes行和N_Height列的dataframe N_raytimes ,这将允许以您喜欢的任何方式轻松切片和聚合数据。

This code gives the expected output.此代码给出了预期的输出。

import pandas as pd
import numpy as np

vr = [
    [2.0, 3.1, 4.1, np.nan],
    [3.1, 4.1, np.nan, 5.1],
    [2.1, np.nan, 6.1, 20.1]
]
Height = [10.0, 20.0, 30.0, 40]
raytimes = [0, 0.1, 0.2]

# Define a dataframe with the data
df = pd.DataFrame(vr, columns=Height, index=raytimes)
df.columns.name = "Height"
df.index.name = "raytimes"

# Split it out (this could be more elegant)
w1 = df[df.index < 0.25]
w2 = df[(df.index >= 0.25) & (df.index < 0.5)]
w3 = df[(df.index >= 0.5) & (df.index < 0.75)]
w4 = df[df.index >= 0.75]

# Compute standard deviations
w1sd = w1.std(axis=0, ddof=0).values
w2sd = w2.std(axis=0, ddof=0).values
w3sd = w3.std(axis=0, ddof=0).values
w4sd = w4.std(axis=0, ddof=0).values

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM