How do I remove rows from a dataframe?

Question

I'm trying to remove outliers from a dataset. In order to do that, I'm using:

df = df[df.attr < df.attr.mean() + df.attr.std()*3]

That seems to work as expected, but, when I do something like:

for i in xrange(df.shape[0]):
    print df.attr[i]

Then I get a KeyError . Seems like Pandas isn't actually returning a new DataFrame with rows dropped. How do I actually remove those rows, and get a fully functional DataFrame back?

Answer 1

I think need DataFrame.ix :

for i in xrange(df.shape[0]):
    print df.ix[i, 'attr']

Or Series.iloc :

for i in xrange(df.shape[0]):
    print df.attr.iloc[i]

Simplier solution with Series.iteritems :

for i, val in df.attr.iteritems():
    print (val)

Answer 2

First, find the indices which meet the criteria (which in your case is df.attr < df.attr.mean() + df.attr.std()*3).

x = df.loc[:,attr] < df.attr.mean() + df.attr.std()*3

Next, use DataFrame.drop .

df.drop(x[x].index)

See answers such as How to drop a list of rows from Pandas dataframe? for more information

How do I remove rows from a dataframe?

Question

2 answers

solution1
2 2016-11-12 22:16:07

solution2
2 ACCPTED 2016-11-12 22:23:19

How do I remove rows from a dataframe?

Question

2 answers

solution1 2 2016-11-12 22:16:07

solution2 2 ACCPTED 2016-11-12 22:23:19

solution1
2 2016-11-12 22:16:07

solution2
2 ACCPTED 2016-11-12 22:23:19