简体   繁体   English

Pandas Dataframe 替换异常值

[英]Pandas Dataframe replace outliers

Thank you in advance for your help!预先感谢您的帮助! (Code Provided Below) (Data Here ) (代码如下)(数据在这里

I would like to remove the outliers outside of 5/6th standard deviation for columns 5 cm through 225 cm and replace them with the average value for that date (Month/Day) and depth.我想删除 5 厘米到 225 厘米列的 5/6 标准偏差之外的异常值,并将它们替换为该日期(月/日)和深度的平均值。 What is the best way to do that?最好的方法是什么?

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
raw_data = pd.read_csv('all-deep-soil-temperatures.csv', index_col=1, parse_dates=True)
df_all_stations = raw_data.copy()
df_selected_station.fillna(method = 'ffill', inplace=True);
df_selected_station_D=df_selected_station.resample(rule='D').mean()
df_selected_station_D['Day'] = df_selected_station_D.index.dayofyear
mean=df_selected_station_D.groupby(by='Day').mean()
mean['Day']=mean.index
mean.head()

在此处输入图片说明

For a more general solution, assuming that you are given a dataframe df with some column a .对于更通用的解决方案,假设您获得了一个带有aa数据框df

from scipy import stats.
df[np.abs(stats.zscore(df['a'])) > 5]['a'] = df['a'].mean()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM