简体   繁体   中英

Group data by week in Pandas

I'm having this data frame:

 Name Date author Apple 2022-03-15 sahil_1 Orange 2022-03-16 sahil_2 Apple 2022-03-17 sahil_3 Orange 2022-03-18 sahil_1 Apple 2022-03-19 sahil_2 Banana 2022-03-20 sahil_3 Apple 2019-12-19 sahil_3 Orange 2004-01-07 sahil_1

I want to aggregate this by Name and Date(weekly) to get count of record.

Date: Group, the result should be at the beginning of the week (or just on Monday)

Count: ADD, if two or more records have same Name and fall in a same week Date time (if falls on same interval 7 days week)

The desired output is given below:

 Name Date count Apple 2019-12-16 1 Apple 2022-03-14 3 Banana 2022-03-14 1 Orange 2004-01-05 1 Orange 2022-03-14 2

Note - All date in result are monday OR say first day of week.
If possible result should be sorted in asc(increasing) order and date also for respective of every name.

Thanks in advance.

Not getting how to proceed further.

import pandas as pd 

Name = ["Apple", "Orange", "Apple", "Orange", "Apple", "Banana", "Apple","Orange"]
Date = ["2022-03-15","2022-03-16","2022-03-17","2022-03-18","2022-03-19","2022-03-20","2019-12-19","2004-01-07"]
author = ["sahil_1","sahil_2","sahil_3","sahil_1","sahil_2","sahil_3","sahil_3","sahil_1"]

df = pd.DataFrame(zip(Name,Date,author), columns=["Name", "Date", "Author"])
df['Date'] = pd.to_datetime(df['Date']) - pd.to_timedelta(7, unit='d')
x = df.groupby(['Name', pd.Grouper(key='Date', freq='W-MON')])['Name'].count()
print(x)

Thanks @Pedrinho for Quick help. I modified code little bit and got my desired result. But not sure if its right approach to do it.

Solution code -

import pandas as pd 

Name = ["Apple", "Orange", "Apple", "Orange", "Apple", "Banana", "Apple","Orange"]
Date = ["2022-03-15","2022-03-16","2022-03-17","2022-03-18","2022-03-19","2022-03-20","2019-12-19","2004-01-07"]
author = ["sahil_1","sahil_2","sahil_3","sahil_1","sahil_2","sahil_3","sahil_3","sahil_1"]

df = pd.DataFrame(zip(Name,Date,author), columns=["Name", "Date", "Author"])

df['Date'] = pd.to_datetime(df['Date']) - pd.to_timedelta(7, unit='d')

df = df.groupby(['Name', pd.Grouper(key='Date', freq='W-MON')])#['Name'].count()

result = []

for group_id, group_df in df:
    res = {}
    res['Name'] = group_id[0]
    res['Week'] = str(group_id[1])[:-9]  
    res['count'] =  group_df['Name'].count()
    result.append(res)

print(f"Result df is: {result}")

Console o/p -

Result df is: [{'Name': 'Apple', 'Week': '2019-12-16', 'count': 1}, {'Name': 'Apple', 'Week': '2022-03-14', 'count': 3}, {'Name': 'Banana', 'Week': '2022-03-14', 'count': 1}, {'Name': 'Orange', 'Week': '2004-01-05', 'count': 1}, {'Name': 'Orange', 'Week': '2022-03-14', 'count': 2}]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM