[英]Pandas filter rows by last 12 months in data frame
I need to keep only the rows with other columns for months with past 12 months.在过去的 12 个月中,我只需要保留其他列的行数月。 The max date here is 2022-08-01, so the resulting dataframe should have data from 2021-09-01 to 2022-08-01 Input data frame:这里的最大日期是 2022-08-01,因此生成的 dataframe 应该有 2021-09-01 到 2022-08-01 的数据输入数据帧:
d = {'MONTH': ['2021-01-01', '2021-02-01','2021-03-01','2021-04-01',
'2021-05-01', '2021-06-01','2021-07-01','2021-08-01',
'2021-09-01', '2021-10-01','2021-11-01','2021-12-01',
'2022-01-01', '2022-02-01','2022-03-01','2022-04-01',
'2022-05-01', '2022-06-01','2022-07-01','2022-08-01',
'2022-01-01', '2022-02-01','2022-03-01','2022-04-01',
'2022-05-01', '2022-06-01','2022-07-01','2022-08-01'],
'col2': [3,4,1,2,
3,4,1,2,
3,4,1,2,
3,4,1,2,
3,4,1,2,
3,4,1,2,
3,4,1,2],
'col3': [3,4,1,2,
3,4,1,2,
3,4,1,2,
3,4,1,2,
3,4,1,2,
3,4,1,2,
3,4,1,2],
'col4': [3,4,1,2,
3,4,1,2,
3,4,1,2,
3,4,1,2,
3,4,1,2,3,4,1,2,
3,4,1,2]}
df = pd.DataFrame(data=d)
Resulting in dataframe:导致 dataframe:
d = {'MONTH': ['2021-09-01', '2021-10-01','2021-11-01','2021-12-01',
'2022-01-01', '2022-02-01','2022-03-01','2022-04-01',
'2022-05-01', '2022-06-01','2022-07-01','2022-08-01',
'2022-01-01', '2022-02-01','2022-03-01','2022-04-01',
'2022-05-01', '2022-06-01','2022-07-01','2022-08-01'],
'col2': [3,4,1,2,
3,4,1,2,
3,4,1,2,
3,4,1,2,
3,4,1,2],
'col3': [3,4,1,2,
3,4,1,2,
3,4,1,2,
3,4,1,2,
3,4,1,2],
'col4': [3,4,1,2,
3,4,1,2,
3,4,1,2,3,4,1,2,
3,4,1,2]}
df = pd.DataFrame(data=d)
Use pd.to_datetime使用 pd.to_datetime
df['MONTH'] = pd.to_datetime(df['MONTH'])
df_new = df[df['MONTH'] >= '2021-09-01']
If you want to make it dynamic depending upon max date of the dataset, use relativedelta如果要根据数据集的最大日期使其动态化,请使用 relativedelta
from dateutil.relativedelta import relativedelta
df_new = df[df['MONTH'] >= df['MONTH'].max() - relativedelta(months=11)]
import pandas as pd
d = {'MONTH': ['2021-01-01', '2021-02-01','2021-03-01','2021-04-01',
'2021-05-01', '2021-06-01','2021-07-01','2021-08-01',
'2021-09-01', '2021-10-01','2021-11-01','2021-12-01',
'2022-01-01', '2022-02-01','2022-03-01','2022-04-01',
'2022-05-01', '2022-06-01','2022-07-01','2022-08-01',
'2022-01-01', '2022-02-01','2022-03-01','2022-04-01',
'2022-05-01', '2022-06-01','2022-07-01','2022-08-01'],
'col2': [3,4,1,2,
3,4,1,2,
3,4,1,2,
3,4,1,2,
3,4,1,2,
3,4,1,2,
3,4,1,2],
'col3': [3,4,1,2,
3,4,1,2,
3,4,1,2,
3,4,1,2,
3,4,1,2,
3,4,1,2,
3,4,1,2],
'col4': [3,4,1,2,
3,4,1,2,
3,4,1,2,
3,4,1,2,
3,4,1,2,3,4,1,2,
3,4,1,2]}
df = pd.DataFrame(data=d)
rslt_df = df[(df['MONTH'] >= '2021-09-01') & (df['MONTH']<='2022-08-01')]
print(rslt_df)
you can use and condition to select data from dataframe您可以使用和调节来自 dataframe 的 select 数据
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.