简体   繁体   中英

How can I drop several rows from my Dataframe?

I have a dataframe (called my_df1) and want to drop several rows based on certain dates. How can I create a new dataframe (my_df2) without the dates '2020-05-01' and '2020-05-04'?

I tried the following which did not work as you can see below:

my_df2 = mydf_1[(mydf_1['Date'] != '2020-05-01') | (mydf_1['Date'] != '2020-05-04')] 
my_df2.head()

在此处输入图片说明

The problem seems to be with your logical operator. You should be using and here instead of or since you have to select all the rows which are not 2020-05-01 and 2020-05-04 .

The bitwise operators will not be short circuiting and hence the result.

You can use isin with negation ~ sign:

dates=['2020-05-01', '2020-05-04']
my_df2 = mydf_1[~mydf_1['Date'].isin(dates)] 

The short explanation about your mistake AND and OR was addressed by kanmaytacker. Following a few additional recommendations:

Indexing in pandas:

By label .loc
By index .iloc

By label also works without .loc but it's slower as it's composed of chained operations instead of a single internal operation consisting on nested loops (see here ). Also, with .loc you can select on more than one axis at a time.

# example with rows. Same logic for columns or additional axis.
df.loc[(df['a']!=4) & (df['a']!=1),:] # ".loc" is the only addition
>>>
   a  b  c
2  0  4  6

Your index is a boolean set. This is true for numpy and as a consecuence, pandas too.

(df['a']!=4) & (df['a']!=1)
>>>
0    False
1    False
2     True
Name: a, dtype: bool

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM