[英]Python How do I count the number of month of 30 days after a date
I have a dataframe containing date, and I would like to process the data as follow for feature engineering 我有一个包含日期的数据框,我想按照以下特征工程处理数据
df DF
date
2016/1/1
2015/2/10
2016/4/5
after process I would like to make the df looks like 在进程之后我想让df看起来像
date Jan Feb Mar Apr
2016/1/1 30 0 0 0 //date from 1/1 to 1/30 : the number of dates in jan
2015/2/10 0 19 11 0 //date from 2/10 to 3/11 : the number of dates in feb and no of dates in mar
2016/3/25 0 0 7 21 //date from 3/25 to 4/21 : the number of dates in mar and no of dates in apr
get 30 days after the df["date"] df [“日期”]后30天
df["date"] + timedelta(month=1) df [“date”] + timedelta(month = 1)
count the frequency of months which belong to the specific 30 days 计算属于特定30天的月份频率
Is there any method to do this quickly? 有什么方法可以快速完成吗?
Thanks. 谢谢。
Just go step by step. 一步一步走。 First you offset your original date by + pd.to_timedelta('30d')
. 首先,您将原始日期偏移+ pd.to_timedelta('30d')
。 Then create a column indicating the month only by df.date.dt.month
. 然后创建一个仅由df.date.dt.month
指示月份的列。 Then create a column with the end-of-month date for each date - some ideas for that are here: Want the last day of each month for a data frame in pandas . 然后创建一个包含每个日期的月末日期的列 - 这里有一些想法: 想要在每个月的最后一天获得pandas中的数据框 。 Finally, fill in a matrix where the columns are the 12 months, setting the values in the columns for the month and month+1. 最后,填写一个矩阵,其中列为12个月,在月份和月份的列中设置值+ 1。
By enriching your DataFrame one column at a time, you can easily move from your input to your desired output. 通过一次丰富DataFrame一列,您可以轻松地从输入移动到所需的输出。 There is not likely to be a magic method that does everything in a single call. 在一次通话中不可能有一种神奇的方法来完成所有事情。
Read all about date/time functions in Pandas here: https://pandas.pydata.org/pandas-docs/stable/timeseries.html - there are a lot! 在这里阅读关于熊猫日期/时间函数的所有内容: https : //pandas.pydata.org/pandas-docs/stable/timeseries.html - 有很多内容!
You can use custom function with date_range
and groupby
with size
: 您可以使用自定义功能与date_range
和groupby
与size
:
date = df[['date']]
names = ['Jan', 'Feb','Mar','Apr','May']
def f(x):
print (x['date'])
a = pd.date_range(x['date'], periods=30)
a = pd.Series(a).groupby(a.month).size()
return (a)
df = df.apply(f, axis=1).fillna(0).astype(int)
df = df.rename(columns = {k:v for k,v in enumerate(names)})
df = date.join(df)
print (df)
date Feb Mar Apr May
0 2016-01-01 30 0 0 0
1 2015-02-10 0 19 11 0
2 2016-03-25 0 0 7 23
Similar solution with value_counts
: 与value_counts
类似的解决方案:
date = df[['date']]
names = ['Jan', 'Feb','Mar','Apr','May']
df = df.apply(lambda x: pd.date_range(x['date'], periods=30).month.value_counts(), axis=1)
.fillna(0)
.astype(int)
df = df.rename(columns = {k:v for k,v in enumerate(names)})
df = date.join(df)
print (df)
Another solution: 另一种方案:
names = ['Jan', 'Feb','Mar','Apr','May']
date = df[['date']]
df["date1"] = df["date"] + pd.Timedelta(days=29)
df = df.reset_index().melt(id_vars='index', value_name='date').set_index('date')
df = df.groupby('index').resample('D').asfreq()
df = df.groupby([df.index.get_level_values(0), df.index.get_level_values(1).month])
.size()
.unstack(fill_value=0)
df = df.rename(columns = {k+1:v for k,v in enumerate(names)})
df = date.join(df)
print (df)
date Jan Feb Mar Apr
0 2016-01-01 30 0 0 0
1 2015-02-10 0 19 11 0
2 2016-03-25 0 0 7 23
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.