在 pandas dataframe 中查找特定的值组合

Question

I am preparing a dataframe for machine learning.我正在准备一个 dataframe 用于机器学习。 The data set contains weather data from several weather stations in australia over a period of 10 years.该数据集包含来自澳大利亚多个气象站的 10 年期间的天气数据。 One of the measured attributes is Evaporation.测量的属性之一是蒸发。 It has about 50% missing values.它有大约 50% 的缺失值。 Now I want to find out, whether the missing values are evenly distributed over all weather stations or if roughly half of the weather stations just never measured Evaporation.现在我想知道，缺失值是否均匀分布在所有气象站，或者大约一半的气象站从未测量过蒸发量。

How can I find out about the distribution of a value in combination with another attribute?如何找出一个值与另一个属性结合的分布？ I basically want to loop over the weather stations and get a count of NaNs and normal values.我基本上想遍历气象站并获得 NaN 和正常值的计数。

rain_df.query('Location == "Albury"').Location.count()

This gives me the number of measurement points from the weaher station in Albury.这给了我奥尔伯里气象站的测量点数。 Now how can I find out how many NaNs were measured in Albury compared to normal (non-NaN) measurements?现在，与正常（非 NaN）测量相比，我如何找出在奥尔伯里测量了多少 NaN？

Answer 1

You can use .isnull() to mask a series with True for NaNs and False for everything else.您可以使用.isnull()来屏蔽一系列，用 True 表示 NaN，用 False 表示其他所有内容。 Then you can use .value_counts(normalize=True) to get the proportions of NaN and non NaN in that series.然后您可以使用.value_counts(normalize=True)来获取该系列中 NaN 和非 NaN 的比例。

rain_df.query('Location == "Albury"').Location.isnull().value_counts(normalize=True)

在 pandas dataframe 中查找特定的值组合

问题描述

1 个解决方案

解决方案1
0 2021-04-09 11:04:08

在 pandas dataframe 中查找特定的值组合

问题描述

1 个解决方案

解决方案1 0 2021-04-09 11:04:08

解决方案1
0 2021-04-09 11:04:08