[英]How do I use enumerate in Python to compute STD of a list?
I'm trying to compute the standard deviation of a list vr
.我正在尝试计算列表
vr
的标准偏差。 The list size is 32, containing an array of size 3980. This array represents a value at a given height
(3980 heights).列表大小为 32,包含大小为 3980 的数组。此数组表示给定
height
(3980 高度)处的值。
First I split the data into 15 minute chunks, where the minutes are given in raytimes
.首先,我将数据分成 15 分钟的块,其中的分钟以
raytimes
给出。 raytimes
is a list of size 32
as well (containing just the time of the observation, vr
). raytimes
也是一个大小为32
的列表(仅包含观察时间, vr
)。
I want the standard deviation computed at each height
level, such that I end up with one final array of size 3980
.我想要在每个
height
级别计算标准偏差,这样我最终会得到一个大小为3980
最终数组。 This happens OK in my code.这发生在我的代码中。 However my code does not produce the correct standard deviation value when I test it — that is to say the values that are output to
w1sd
, w2sd
etc, are not correct (however the array is the correct size: an array of 3980
elements).然而,我的代码在测试时没有产生正确的标准偏差值——也就是说,输出到
w1sd
、 w2sd
等的值不正确(但是数组的大小正确:一个包含3980
元素的数组)。 I assume I am mixing up the wrong indices when computing the standard deviation.我假设在计算标准偏差时我混淆了错误的索引。
Below are example values from the dataset.以下是数据集中的示例值。 All data should fall into
w1
and w1sd
as the raytimes
provided in this example are all within 15 minutes (< 0.25).所有数据都应该属于
w1
和w1sd
因为本示例中提供的raytimes
都在 15 分钟内 (< 0.25)。 I want to compute the standard deviation of the first element of vr
, that is, the standard deviation of 2.0 + 3.1 + 2.1
, then the second element, or standard deviation of 3.1 + 4.1 + nan
etc. The result for w1sd
SHOULD BE [0.497, 0.499, 1.0, 7.5]
but instead the code as below gives a nanstd
in w1sd = [0.497, 0.77, 1.31, 5.301]
.我想计算
vr
的第一个元素的标准偏差,即2.0 + 3.1 + 2.1
的标准偏差,然后是第二个元素,或标准偏差3.1 + 4.1 + nan
等。 w1sd
的结果应该是[0.497, 0.499, 1.0, 7.5]
但是下面的代码在w1sd = [0.497, 0.77, 1.31, 5.301]
中给出了一个nanstd
。 Is it something wrong with nanstd
or my indexing? nanstd
或我的索引有问题吗?
vr = [
[2.0, 3.1, 4.1, nan],
[3.1, 4.1, nan, 5.1],
[2.1, nan, 6.1, 20.1]
]
Height = [10.0, 20.0, 30.0, 40]
raytimes = [0, 0.1, 0.2]
for j, h in enumerate(Height):
for i, t in enumerate(raytimes):
if raytimes[i] < 0.25:
w1.append(float(vr[i][j]))
elif 0.25 <= raytimes[i] < 0.5:
w2.append(float(vr[i][j]))
elif 0.5 <= raytimes[i] < 0.75:
w3.append(float(vr[i][j]))
else:
w4.append(float(vr[i][j]))
w1sd.append(round(nanstd(w1), 3))
w2sd.append(round(nanstd(w2), 3))
w3sd.append(round(nanstd(w3), 3))
w4sd.append(round(nanstd(w4), 3))
w1 = []
w2 = []
w3 = []
w4 = []
I would consider using pandas
for this.我会考虑为此使用
pandas
。 It is a library that allows for efficient processing of datasets in numpy
arrays and takes all the looping and indexing out of your hands.它是一个库,允许高效处理
numpy
数组中的数据集,并让您numpy
进行所有循环和索引。
In this case I would define a dataframe
with N_raytimes
rows and N_Height
columns, which would allow to easily slice and aggregate the data any way you like.在这种情况下,我将定义一个包含
N_raytimes
行和N_Height
列的dataframe
N_raytimes
,这将允许以您喜欢的任何方式轻松切片和聚合数据。
This code gives the expected output.此代码给出了预期的输出。
import pandas as pd
import numpy as np
vr = [
[2.0, 3.1, 4.1, np.nan],
[3.1, 4.1, np.nan, 5.1],
[2.1, np.nan, 6.1, 20.1]
]
Height = [10.0, 20.0, 30.0, 40]
raytimes = [0, 0.1, 0.2]
# Define a dataframe with the data
df = pd.DataFrame(vr, columns=Height, index=raytimes)
df.columns.name = "Height"
df.index.name = "raytimes"
# Split it out (this could be more elegant)
w1 = df[df.index < 0.25]
w2 = df[(df.index >= 0.25) & (df.index < 0.5)]
w3 = df[(df.index >= 0.5) & (df.index < 0.75)]
w4 = df[df.index >= 0.75]
# Compute standard deviations
w1sd = w1.std(axis=0, ddof=0).values
w2sd = w2.std(axis=0, ddof=0).values
w3sd = w3.std(axis=0, ddof=0).values
w4sd = w4.std(axis=0, ddof=0).values
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.