[英]How to manipulate pandas datetime objects to get the last date of previous month
I have a pandas dataframe and want to turn all the dates into the last date of the previous month.我有一个 pandas dataframe 并想将所有日期变成上个月的最后一个日期。 For example "2020-02-04" should turn into "2020-01-31", "2020-03-03" should turn into "2020-02-28" and so on.例如“2020-02-04”应该变成“2020-01-31”,“2020-03-03”应该变成“2020-02-28”等等。 My df looks like this (in the month column I already have the right month for my wanted date):我的 df 看起来像这样(在月份列中,我已经有了适合我想要的日期的月份):
In[76]: dfall[["date", "month"]]
Out[76]:
date month
0 2020-02-04 1
1 2020-03-03 2
2 2020-04-02 3
3 2020-05-05 4
4 2020-06-03 5
5 2020-07-02 6
Now I tried this:现在我尝试了这个:
import calendar
import datetime
today = datetime.now()
dfall.date = str(today.year) + "-" + str(dfall.month) + "-" + str(calendar.monthrange(today.year,dfall.month)[1])
The idea was to build the new date by adding the strings together.这个想法是通过将字符串添加在一起来构建新日期。 But this code raises an error:但是这段代码引发了一个错误:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I know the error is coming from this part: str(calendar.monthrange(today.year,dfall.month)[1])
(without this part the codes runs without error but the result is not what I want).我知道错误来自这部分: str(calendar.monthrange(today.year,dfall.month)[1])
(没有这部分代码运行没有错误,但结果不是我想要的)。 It's probably because python doesnt know which month to take from dfall.month
.这可能是因为 python 不知道从dfall.month
取哪个月份。 Does anybody know how I could handle that problem?有人知道我该如何处理这个问题吗?
As an alternative, you could try this instead:作为替代方案,您可以尝试以下方法:
dfall.date=dfall.date.apply(lambda x: x.replace(day=1)- pd.Timedelta(days=1))
If the dfall.date
is type string, try this instead:如果dfall.date
是字符串类型,请尝试以下操作:
dfall.date=pd.to_datetime(dfall.date).apply(lambda x: x.replace(day=1)- pd.Timedelta(days=1))
You could try this another vectorized alternative, made by Kyle Barron , to avoid the usage of df.apply(lambda x: x.replace(day=1))
and speeds up to 8.5x the performance :您可以尝试由Kyle Barron制作的另一种矢量化替代方案,以避免使用df.apply(lambda x: x.replace(day=1))
并将性能提高到8.5 倍:
def vec_dt_replace(series, year=None, month=None, day=None):
return pd.to_datetime(
{'year': series.dt.year if year is None else year,
'month': series.dt.month if month is None else month,
'day': series.dt.day if day is None else day})
#dfall.date=pd.to_datetime(dfall.date) #(if dfall.date is type string)
dfall.date=vec_dt_replace(dfall.date,day=1)- pd.Timedelta(days=1)
If you want to keep your original solution, then:如果您想保留原始解决方案,则:
str(dfall.month)
to dfall.month.astype(str)
将str(dfall.month)
更改为dfall.month.astype(str)
str(calendar.monthrange(today.year,dfall.month)[1])
to dfall.month.apply(lambda x:calendar.monthrange(today.year,x)[1]).astype(str)
str(calendar.monthrange(today.year,dfall.month)[1])
更改为dfall.month.apply(lambda x:calendar.monthrange(today.year,x)[1]).astype(str)
pd.to_datetime(dfall.date)
获得字符串后,您应该将其转换为日期时间: pd.to_datetime(dfall.date)
import calendar
import datetime
today = datetime.datetime.now()
dfall.date = str(today.year) + "-" + dfall.month.astype(str) + "-" + dfall.month.apply(lambda x:calendar.monthrange(today.year,x)[1]).astype(str)
dfall.date = pd.to_datetime(dfall.date)
print(dfall)
Output of all solutions: Output 的所有解决方案:
dfall[["date", "month"]]
date month
0 2020-01-31 1
1 2020-02-29 2
2 2020-03-31 3
3 2020-04-30 4
4 2020-05-31 5
5 2020-06-30 6
Alternative approach:替代方法:
from datetime import datetime, timeldelta
def convert_date(date_str):
date = datetime.strptime(date_str, '%Y-%m-%d')
return (date - timedelta(days=date.day)).strftime('%Y-%m-%d')
dfall.date.apply(convert_date)
assuming 'date'
column is of type string (use .astype(str)
or strftime
otherwise), you can cast the year-month part to datetime
and subtract a timedelta
of one day:假设'date'
列是字符串类型(否则使用.astype(str)
或strftime
),您可以将年月部分转换为datetime
时间并减去一天的时间timedelta
:
dfall['lastdaylastmonth'] = pd.to_datetime(dfall['date'].str[:-3]) - pd.Timedelta(days=1)
# dfall['lastdaylastmonth']
# 0 2020-01-31
# 1 2020-02-29
# 2 2020-03-31
# 3 2020-04-30
# 4 2020-05-31
# 5 2020-06-30
# Name: lastdaylastmonth, dtype: datetime64[ns]
import datetime
from datetime import timedelta
df = pd.DataFrame({"date":['2020-02-04','2020-03-03','2020-04-02','2020-05-05','2020-06-03','2020-07-02'],
"month": [1,2,3,4,5,6]})
# Conert to data
def change_time_format(series):
return datetime.datetime.strptime(series,"%Y-%m-%d")
df.date = df.date.apply(change_time_format)
dates = list(df.date)
previous_m_last_date = []
for d in dates:
days = d.day
u_date = d - timedelta(days)
previous_m_last_date.append(u_date)
df["updated_date"] = previous_m_last_date
df
Another approach:另一种方法:
import datetime
for index, d in df.iterrows():
temp = d["date"]
dtObj = datetime.datetime.strptime(temp, "%Y-%m-%d")
newDt = dtObj - datetime.timedelta(days=dtObj.day)
df["date"][index] = datetime.datetime.strftime(newDt, "%Y-%m-%d")
from datetime import datetime从日期时间导入日期时间
dates = [datetime(2020, 2, 4), datetime(2020, 3, 3), datetime(2020, 4, 2), datetime(2020, 5, 5), datetime(2020, 6, 3), datetime(2020, 7, 2)]日期 = [日期时间(2020, 2, 4), 日期时间(2020, 3, 3), 日期时间(2020, 4, 2), 日期时间(2020, 5, 5), 日期时间(2020, 6, 3), 日期时间( 2020, 7, 2)]
month = [1, 2, 3, 4, 5, 6]月 = [1, 2, 3, 4, 5, 6]
ts = pd.Series(month, index=dates) ts = pd.Series(月,索引=日期)
date_col = ts.shift(-1, freq='M').index date_col = ts.shift(-1, freq='M').index
pd.DataFrame({'Dates': date_col, 'Month': month}) pd.DataFrame({'Dates': date_col, 'Month': month})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.