简体   繁体   中英

Pandas filter rows by last 12 months in data frame

I need to keep only the rows with other columns for months with past 12 months. The max date here is 2022-08-01, so the resulting dataframe should have data from 2021-09-01 to 2022-08-01 Input data frame:

    d = {'MONTH': ['2021-01-01', '2021-02-01','2021-03-01','2021-04-01',
               '2021-05-01', '2021-06-01','2021-07-01','2021-08-01',
               '2021-09-01', '2021-10-01','2021-11-01','2021-12-01',
               '2022-01-01', '2022-02-01','2022-03-01','2022-04-01',
               '2022-05-01', '2022-06-01','2022-07-01','2022-08-01',
               '2022-01-01', '2022-02-01','2022-03-01','2022-04-01',
               '2022-05-01', '2022-06-01','2022-07-01','2022-08-01'], 
     'col2': [3,4,1,2,
              3,4,1,2,
              3,4,1,2,
              3,4,1,2,
              3,4,1,2,
             3,4,1,2,
              3,4,1,2],
    'col3': [3,4,1,2,
              3,4,1,2,
              3,4,1,2,
              3,4,1,2,
              3,4,1,2,
            3,4,1,2,
              3,4,1,2],
    'col4': [3,4,1,2,
              3,4,1,2,
              3,4,1,2,
              3,4,1,2,
              3,4,1,2,3,4,1,2,
              3,4,1,2]}
df = pd.DataFrame(data=d)

Resulting in dataframe:

d = {'MONTH': ['2021-09-01', '2021-10-01','2021-11-01','2021-12-01',
               '2022-01-01', '2022-02-01','2022-03-01','2022-04-01',
               '2022-05-01', '2022-06-01','2022-07-01','2022-08-01',
               '2022-01-01', '2022-02-01','2022-03-01','2022-04-01',
               '2022-05-01', '2022-06-01','2022-07-01','2022-08-01'], 
     'col2': [3,4,1,2,
              3,4,1,2,
              3,4,1,2,
             3,4,1,2,
              3,4,1,2],
    'col3': [3,4,1,2,
              3,4,1,2,
              3,4,1,2,
            3,4,1,2,
              3,4,1,2],
    'col4': [3,4,1,2,
              3,4,1,2,
              3,4,1,2,3,4,1,2,
              3,4,1,2]}
df = pd.DataFrame(data=d)

Use pd.to_datetime

df['MONTH'] = pd.to_datetime(df['MONTH'])
df_new = df[df['MONTH'] >= '2021-09-01']

If you want to make it dynamic depending upon max date of the dataset, use relativedelta

from dateutil.relativedelta import relativedelta
df_new = df[df['MONTH'] >= df['MONTH'].max() - relativedelta(months=11)]
import pandas as pd
d = {'MONTH': ['2021-01-01', '2021-02-01','2021-03-01','2021-04-01',
           '2021-05-01', '2021-06-01','2021-07-01','2021-08-01',
           '2021-09-01', '2021-10-01','2021-11-01','2021-12-01',
           '2022-01-01', '2022-02-01','2022-03-01','2022-04-01',
           '2022-05-01', '2022-06-01','2022-07-01','2022-08-01',
           '2022-01-01', '2022-02-01','2022-03-01','2022-04-01',
           '2022-05-01', '2022-06-01','2022-07-01','2022-08-01'], 
 'col2': [3,4,1,2,
          3,4,1,2,
          3,4,1,2,
          3,4,1,2,
          3,4,1,2,
         3,4,1,2,
          3,4,1,2],
'col3': [3,4,1,2,
          3,4,1,2,
          3,4,1,2,
          3,4,1,2,
          3,4,1,2,
        3,4,1,2,
          3,4,1,2],
'col4': [3,4,1,2,
          3,4,1,2,
          3,4,1,2,
          3,4,1,2,
          3,4,1,2,3,4,1,2,
          3,4,1,2]}
          
 df = pd.DataFrame(data=d)
 rslt_df = df[(df['MONTH'] >= '2021-09-01') & (df['MONTH']<='2022-08-01')]
 print(rslt_df)

you can use and condition to select data from dataframe

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM