[英]How to exclude date in Pandas Dataframe if not “end of month”
I have the following dataset: 我有以下数据集:
import datetime
import pandas as pd
df = pd.DataFrame({'PORTFOLIO': ['A', 'A', 'A', 'A','A', 'A', 'A', 'A','A', 'A','A', 'A', 'A', 'A'],
'DATE': ['28-02-2018','31-03-2018','30-04-2018','31-05-2018','30-06-2018','31-07-2018','31-08-2018',
'30-09-2018','31-10-2018','30-11-2018','31-12-2018','31-01-2019','28-02-2019','05-03-2019'],
'IRR': [.7, .8, .9, .4, .2, .3, .4, .9, .7, .8, .9, .4,.7, .8],
})
df
PORTFOLIO DATE IRR
0 A 2018-02-28 0.7
1 A 2018-03-31 0.8
2 A 2018-04-30 0.9
3 A 2018-05-31 0.4
4 A 2018-06-30 0.2
5 A 2018-07-31 0.3
6 A 2018-08-31 0.4
7 A 2018-09-30 0.9
8 A 2018-10-31 0.7
9 A 2018-11-30 0.8
10 A 2018-12-31 0.9
11 A 2019-01-31 0.4
12 A 2019-02-28 0.7
13 A 2019-05-03 0.8
s you might see, all the dates are "end of month", except for 05-03-2019. 您可能会看到,除了05-03-2019之外,所有日期都是“月底”。 What I need is to drop a DATE-value if its not "end of month".
我需要的是如果不是“月末”,则删除DATE值。
My poor temperary solution is 我糟糕的时态解决方案是
df2=df[df.TODATE < '2019-03-01']
which is not good as the code should be more general. 这不好,因为代码应该更通用。
How do I do that? 我怎么做?
This can be done in a one-liner: use pandas.Series.dt.is_month_end
这可以在一行中完成:使用
pandas.Series.dt.is_month_end
df[pd.to_datetime(df["DATE"]).dt.is_month_end]
will give you your result. 会给你你的结果。
You can use pandas.tseries.offsets.MonthEnd
in order to compare the current dates with the end of month dates, and perform a boolean indexation on the dataframe to keep only those that satisfy the condition: 您可以使用
pandas.tseries.offsets.MonthEnd
将当前日期与月末日期进行比较, pandas.tseries.offsets.MonthEnd
执行布尔索引以仅保留满足条件的那些:
from pandas.tseries.offsets import MonthEnd
df.DATE = pd.to_datetime(df.DATE)
df[df.DATE == df.DATE + MonthEnd(0)]
PORTFOLIO DATE IRR
0 A 2018-02-28 0.7
1 A 2018-03-31 0.8
2 A 2018-04-30 0.9
3 A 2018-05-31 0.4
4 A 2018-06-30 0.2
5 A 2018-07-31 0.3
6 A 2018-08-31 0.4
7 A 2018-09-30 0.9
8 A 2018-10-31 0.7
9 A 2018-11-30 0.8
10 A 2018-12-31 0.9
11 A 2019-01-31 0.4
12 A 2019-02-28 0.7
I am putting this to expand on @Christian Sloper's answer. 我想把它扩展到@Christian Sloper的答案。 I find it easier to reference, if the answer is self contained and I think it will help others.
如果答案是自包含的,我觉得它更容易引用,我认为它会对其他人有所帮助。
I created a new column called MonthEnd and used a filter to get only those that are not month end. 我创建了一个名为MonthEnd的新列,并使用过滤器仅获取那些不是月末的列。
import datetime
import pandas as pd
df = pd.DataFrame({'PORTFOLIO': ['A', 'A', 'A', 'A','A', 'A', 'A', 'A','A', 'A','A', 'A', 'A', 'A'],
'DATE': ['28-02-2018','31-03-2018','30-04-2018','31-05-2018','30-06-2018','31-07-2018','31-08-2018',
'30-09-2018','31-10-2018','30-11-2018','31-12-2018','31-01-2019','28-02-2019','05-03-2019'],
'IRR': [.7, .8, .9, .4, .2, .3, .4, .9, .7, .8, .9, .4,.7, .8],
})
#new column called MonthEnd
df['MonthEnd'] = pd.to_datetime(df['DATE']).dt.is_month_end
#filter to get only those that are not month end
df[~df["MonthEnd"]]
dataframe: 数据帧:
DATE IRR PORTFOLIO MonthEnd
0 28-02-2018 0.7 A True
1 31-03-2018 0.8 A True
2 30-04-2018 0.9 A True
3 31-05-2018 0.4 A True
4 30-06-2018 0.2 A True
5 31-07-2018 0.3 A True
6 31-08-2018 0.4 A True
7 30-09-2018 0.9 A True
8 31-10-2018 0.7 A True
9 30-11-2018 0.8 A True
10 31-12-2018 0.9 A True
11 31-01-2019 0.4 A True
12 28-02-2019 0.7 A True
13 05-03-2019 0.8 A False
After Filter: 过滤后:
DATE IRR PORTFOLIO MonthEnd
13 05-03-2019 0.8 A False
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.