简体   繁体   中英

How can I drop several rows from my Dataframe?

I have a dataframe (called my_df1) and want to drop several rows based on certain dates. How can I create a new dataframe (my_df2) without the dates '2020-05-01' and '2020-05-04'?

I tried the following which did not work as you can see below:

my_df2 = mydf_1[(mydf_1['Date'] != '2020-05-01') | (mydf_1['Date'] != '2020-05-04')] 


The problem seems to be with your logical operator. You should be using and here instead of or since you have to select all the rows which are not 2020-05-01 and 2020-05-04 .

The bitwise operators will not be short circuiting and hence the result.

You can use isin with negation ~ sign:

dates=['2020-05-01', '2020-05-04']
my_df2 = mydf_1[~mydf_1['Date'].isin(dates)] 

The short explanation about your mistake AND and OR was addressed by kanmaytacker. Following a few additional recommendations:

Indexing in pandas:

By label .loc
By index .iloc

By label also works without .loc but it's slower as it's composed of chained operations instead of a single internal operation consisting on nested loops (see here ). Also, with .loc you can select on more than one axis at a time.

# example with rows. Same logic for columns or additional axis.
df.loc[(df['a']!=4) & (df['a']!=1),:] # ".loc" is the only addition
   a  b  c
2  0  4  6

Your index is a boolean set. This is true for numpy and as a consecuence, pandas too.

(df['a']!=4) & (df['a']!=1)
0    False
1    False
2     True
Name: a, dtype: bool

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM