简体   繁体   English

为什么在使用.mean()时得到NaN

[英]Why do I get NaN when using .mean()

This is a part of a GIT open course that I am taking in my free time to learn python. 这是我在业余时间学习python的GIT开放课程的一部分。 The exercise deals only with numpy. 该练习仅处理numpy。 So, below is creating a filepath and importing the data. 因此,下面是创建文件路径并导入数据。 I added skip_header because column names are strings and I get Nan. 我添加了skip_header,因为列名是字符串,并且我得到了Nan。 So, the data has 33 columns and I need only 5 which I added using usecols. 因此,数据有33列,而我只需要使用usecols添加的5列即可。

import numpy as np
fp = 'C:\\Users\\matij\\Documents\\exercise-5-MatijaKordic\\6153237444115dat.csv'
data = np.genfromtxt(fp, skip_header =1, usecols=(0, 2, 22, 27, 28), delimiter=',')

Next, I need to split the data into separate variables called station, date, temp, temp_max, and temp_min. 接下来,我需要将数据拆分为单独的变量,分别称为station,date,temp,temp_max和temp_min。 They correspond to usecols=(0, 2, 22, 27, 28). 它们对应于usecols =(0,2,22,27,28)。

station = data[:, 0]
date = data[:, 1]
temp = data[:, 2]
temp_max = data[:, 3]
temp_min = data[:, 4]

After this, I need to calculate the following: 之后,我需要计算以下内容:

What is the mean Fahrenheit temperature in the data? 数据中的华氏平均温度是多少? (the temp variable) (临时变量)

What is the standard deviation of the Maximum temperature? 最高温度的标准偏差是多少? (the temp_max variable) (temp_max变量)

How many unique stations exists in the data? 数据中存在多少个唯一工作站? (the station variable) (电台变量)

So, I did this: 因此,我这样做:

temp_mean = temp.mean()
temp_max_std = temp_max.std()
station_count = np.unique(station)

And I get NaN for mean and max. 我得到NaN的平均值和最大值。 For unique stations I get [28450. 对于唯一的电台,我得到[28450。 29980.] so I presume I need to somehow add count within? 29980.]因此,我想我需要以某种方式在其中添加计数?

As for the mean and max: - Max is Nan so that is fine. 至于平均值和最大值:-最大值是Nan,这很好。 Not sure why I have it in the assignment but that is a different story. 不知道为什么我要把它放在作业中,但这是一个不同的故事。 - Mean however, is the reason of this question. -意思是这个问题的原因。 When I print temp, I get values so why do I get NaN for temp.mean? 当我打印temp时,我会得到值,那么为什么要得到temp.mean的NaN?

Here is the link to csv if anyone is interested: https://drive.google.com/file/d/1rGneQTfUe2rq1HAPQ06rvLDxzi-ETgKe/view?usp=sharing 如果有人感兴趣,以下是csv的链接: https : //drive.google.com/file/d/1rGneQTfUe2rq1HAPQ06rvLDxzi-ETgKe/view?usp=sharing

I agree with the Anubhav's post, however I suggest to use instead: np.nanmean(temp) to compute the mean forgetting the NaN (Not A Number) entries. 我同意Anubhav的帖子,但是我建议改用: np.nanmean(temp)来计算忘记NaN (非数字)条目的均值。 You will get also the same mean: 41.58918641457781 . 您还将得到相同的平均值: 41.58918641457781 And same thing with max : max相同:

print(np.nanmean(temp))
print(np.nanmax(temp))

Output: 输出:

41.58918641457781
65.0

You are getting nan because some of the data in the numpy array is nan . 您得到nan是因为numpy数组中的某些数据是nan Try this: 尝试这个:

temp_mean = temp[~np.isnan(temp)].mean()
print(temp_mean)
temp_max_std = temp_max[~np.isnan(temp_max)].std()
print(temp_max_std)
station_count = np.unique(station)

output: 输出:

41.58918641457781
9.734807757434636
array([28450., 29980.])

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为什么在使用TensorFlow计算简单的线性回归时会得到[nan]? - Why do I get [nan] when using TensorFlow to calculate a simple linear regression? 为什么在使用 sklearn R2 函数时会得到 nan? - why do I get nan when using sklearn R2 function? 向此 Dataframe 添加部分行时,为什么我得到 NaT 值而不是 NaN? - Why do I get NaT values rather than NaN when adding partial rows to this Dataframe? 为什么更改 NaN 值后会出现 RecursionError - why do i get RecursionError after changing NaN values 为什么我在编码时总是得到一个 nan 集? - Why do I keep getting a nan set when coding? 当我对XGBoost执行均方误差时,为什么会得到KeyError:'Target_Variable'? - Why do I get KeyError: 'Target_Variable' when I perform Mean Squared Error for XGBoost? 在 Pandas 中重新采样:当所有值都是 NaN 时,我如何获得 NaN,但仍然使用 skipna=True? - In resampling in Pandas: How do I get NaN when all values are NaN, but still use skipna=True? 为什么我在使用 ngrok 进行端口转发时得到 2 个端口 - why do I get 2 ports when I portforward using ngrok 为什么当我在两列上使用 groupby 时结果是 NaN 但是当我在一列上使用它时它可以正常工作 - Why when I use groupby on two columns the result is NaN but when I do it on one column it works correctly 当我在训练时间设置 is_training=False 时,为什么 Tensorflow BN 层中的moving_mean和moving _variance会变成nan? - Why would moving_mean and moving _variance in Tensorflow BN layer become nan when I set is_training=False in training time?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM