I'm trying to remove outliers from a dataset. In order to do that, I'm using:
df = df[df.attr < df.attr.mean() + df.attr.std()*3]
That seems to work as expected, but, when I do something like:
for i in xrange(df.shape[0]):
print df.attr[i]
Then I get a KeyError
. Seems like Pandas isn't actually returning a new DataFrame
with rows dropped. How do I actually remove those rows, and get a fully functional DataFrame
back?
I think need DataFrame.ix
:
for i in xrange(df.shape[0]):
print df.ix[i, 'attr']
Or Series.iloc
:
for i in xrange(df.shape[0]):
print df.attr.iloc[i]
Simplier solution with Series.iteritems
:
for i, val in df.attr.iteritems():
print (val)
First, find the indices which meet the criteria (which in your case is df.attr < df.attr.mean() + df.attr.std()*3).
x = df.loc[:,attr] < df.attr.mean() + df.attr.std()*3
Next, use DataFrame.drop .
df.drop(x[x].index)
See answers such as How to drop a list of rows from Pandas dataframe? for more information
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.