简体   繁体   中英

Removing rows that contain the same dates from another dataframe - python - pandas

How do I remove all rows that contain the same dates as another dataframe? I want to keep unique rows with all columns between the two dataframes. Also, i cannot use a merge.

import pandas as pd
from datetime import timedelta
df1 = pd.DataFrame({
        'date': ['2001-02-01','2001-02-02','2001-02-03', '2001-02-04'],
        'value': [101, 201, 310, 410]})
df2 = pd.DataFrame({
        'date': ['2001-02-03','2001-02-04','2001-02-05', '2001-02-05'],
        'value': [121, 231, 610, 990]})
df1['date'] = pd.to_datetime(df1['date'])
df2['date'] = pd.to_datetime(df2['date'])

Returns:

         date  value
0  2001-02-01    101
1  2001-02-02    201
2  2001-02-03    310
3  2001-02-04    410
---
         date  value
0  2001-02-03    121
1  2001-02-04    231
2  2001-02-05    610
3  2001-02-06    990

Desired dataframe:

print(df3)

         date  value
0  2001-02-01    101
1  2001-02-02    201
2  2001-02-05    610
3  2001-02-06    990

I tried df1[~df1.date.notin(df2.date)] , but this throws an error: AttributeError: 'Series' object has no attribute 'notin'

I also tried df1[~df1.date.isin(df2.date) == False] and this returns:

    date    value
2   2001-02-03  310
3   2001-02-04  410

Concatenate the two then drop the duplicate dates:

df3 = pd.concat([df1, df2]).drop_duplicates(subset='date', keep=False)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM