I am trying to remove outliers from a specific column in my dataframe in Python. I found a solution from a few year old post that should work, but searches through the entire dataframe:
df_final[(np.abs(stats.zscore(df_final)) < 3).all(axis=1)]
Since my dataframe has different data types, such as dates, I am getting the following error when I run it
TypeError: unsupported operand type(s) for +: 'Timestamp' and 'Timestamp'
I feel like the solution to just get the outliers for a single column should be easy, but when I try
df_final[(np.abs(stats.zscore(df_final['rating'])) < 3).all(axis=1)]
to get the outliers of only the rating
column, I get an error
AxisError: axis 1 is out of bounds for array of dimension 1
I know (think?) that this problem has to do with the array that is created, but I don't understand it well enough to find a solution. Can anyone better explain it to me?
EDIT: It seems that df_final[(np.abs(stats.zscore(df_final['rating'])) < 3)]
works. Honestly not sure the reasoning behind it, so I'm still interested if anyone can explain or has a better solution.
np.abs(stats.zscore(df_final['rating'])) < 3
This line will return a numpy array, value is a series of True and False. This can be used to do slicing.
For numpy.all, please refer tho the doc . It is not used for your slicing purpose.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.