[英]count number of occurrences within a date range
max_date = datetime.today().strftime('%d-%m-%Y')
min_date = "06-08-2021"
I have a df that looks like this.我有一个看起来像这样的df。 For now it only has 1 row:
目前它只有 1 行:
name value
Name1 23
Then I have another dataset df2
that looks like this:然后我有另一个数据集
df2
,如下所示:
date group
07-08-2021 A
07-08-2021 A
06-08-2021 A
09-08-2021 A
07-08-2021 A
07-08-2021 B
06-08-2021 B
03-08-2020 A
I want to iterate through all rows of df2
and if the date if within the range of min_date
and max_date
, I want to do a cummulative sum of all occurences of A and B.我想遍历
df2
的所有行,如果日期在min_date
和max_date
的范围内,我想做一个 A 和 B 的所有出现的累积和。
This means that I want to count the number of times a particular group type occured within that range.这意味着我想计算特定组类型在该范围内出现的次数。 Then I want to add the that value to my first dataset.
然后我想将该值添加到我的第一个数据集中。 Something like this:
像这样的东西:
name value count_A count_B
Name1 23 5 2
Note that the last row:注意最后一行:
03-08-2020 A
is not counted since the date doesn't fall in the range.不计算在内,因为日期不在范围内。
EDIT: sample df:编辑:样本df:
details = {
'Name' : ['Name1'],
'Value' : [23],
}
df1 = pd.DataFrame(details)
details = {
'Date' : ['07-08-2021', '07-08-2021', '06-08-2021', '09-08-2021','07-08-2021','07-08-2021','06-08-2021','03-08-2020'],
'Group' : ['A', 'A', 'A', 'A','A','B','B','A'],
}
df2 = pd.DataFrame(details)
details = {
'Date' : ['07-08-2021', '07-08-2021', '06-08-2021', '09-08-2021','07-08-2021','07-08-2021','06-08-2021','03-08-2020'],
'Group' : ['A', 'A', 'A', 'A','A','B','B','A'],
}
details1 = {
'Name' : ['Name1'],
'Value' : [23],
}
df1 = pd.DataFrame(details1)
df = pd.DataFrame(details)
max_date = datetime.today().strftime('%d-%m-%Y')
min_date = "06-08-2021"
df = df[(df['Date'] <= max_date) & (df['Date'] > min_date)]
df = df.groupby('Group').count()
df1_transposed = df.T
df1_transposed = df1_transposed[['A', 'B']]
df1_transposed = df1_transposed.reset_index()
df1 = pd.merge(df1, df1_transposed, left_index=True, right_index=True)
df1 = df1[['Name', 'Value', 'A', 'B']]
df1.rename(columns = {'A':'count_A', 'B':'count_B'}, inplace = True)
print(df1)
output输出
Name Value count_A count_B
Name1 23 4 1
Preferably work with datetime.date
objects instead of strings:最好使用
datetime.date
对象而不是字符串:
from datetime import date
max_date = date.today()
min_date = date(2021,8,6)
If the dates in df2 are strings, you may convert them to datetime.date
objects first, while iterating through all the rows:如果 df2 中的日期是字符串,您可以先将它们转换为
datetime.date
对象,同时遍历所有行:
# example for first iteration of df2
from datetime import date
# iterate over all dates in your df2 and include the following:
dash_date = '07-08-2021'
py_date = datetime.strptime(dash_date, '%d-%m-%Y').date()
# check if date of current iteration is between max_date and min_date
py_date > min_date and py_date < max_date
Based on the comparison you can decide whether you want to add the value to your first data set or not.根据比较,您可以决定是否要将值添加到您的第一个数据集。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.