简体   繁体   English

在 pandas dataframe 中查找特定的值组合

[英]Find specific combination of values in pandas dataframe

I am preparing a dataframe for machine learning.我正在准备一个 dataframe 用于机器学习。 The data set contains weather data from several weather stations in australia over a period of 10 years.该数据集包含来自澳大利亚多个气象站的 10 年期间的天气数据。 One of the measured attributes is Evaporation.测量的属性之一是蒸发。 It has about 50% missing values.它有大约 50% 的缺失值。 Now I want to find out, whether the missing values are evenly distributed over all weather stations or if roughly half of the weather stations just never measured Evaporation.现在我想知道,缺失值是否均匀分布在所有气象站,或者大约一半的气象站从未测量过蒸发量。

How can I find out about the distribution of a value in combination with another attribute?如何找出一个值与另一个属性结合的分布? I basically want to loop over the weather stations and get a count of NaNs and normal values.我基本上想遍历气象站并获得 NaN 和正常值的计数。

rain_df.query('Location == "Albury"').Location.count()

This gives me the number of measurement points from the weaher station in Albury.这给了我奥尔伯里气象站的测量点数。 Now how can I find out how many NaNs were measured in Albury compared to normal (non-NaN) measurements?现在,与正常(非 NaN)测量相比,我如何找出在奥尔伯里测量了多少 NaN?

You can use .isnull() to mask a series with True for NaNs and False for everything else.您可以使用.isnull()来屏蔽一系列,用 True 表示 NaN,用 False 表示其他所有内容。 Then you can use .value_counts(normalize=True) to get the proportions of NaN and non NaN in that series.然后您可以使用.value_counts(normalize=True)来获取该系列中 NaN 和非 NaN 的比例。

rain_df.query('Location == "Albury"').Location.isnull().value_counts(normalize=True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM