Sum weekly totals of values from one data frame based on dates in another data frame in python

Question

I want to sum the values in one column of a dataframe for certain dates that are defined by another dataframe.

My first dataframe of dates looks like this:

import numpy as np
import pandas as pd

start_date = ["2-22-16 00:00:00", "2-29-16 00:00:00", "3-7-16 00:00:00", "3-14-16 00:00:00", "3-21-16 00:00:00", "3-28-16 00:00:00", "4-4-16 00:00:00", "4-11-16 00:00:00", "4-18-16 00:00:00", "4-25-16 00:00:00", "5-2-16 00:00:00", "5-9-16 00:00:00", "5-16-16 00:00:00", "5-23-16 00:00:00", "5-30-16 00:00:00", "6-6-16 00:00:00", "6-13-16 00:00:00", "6-20-16 00:00:00", "6-27-16 00:00:00", "7-4-16 00:00:00", "7-11-16 00:00:00", "7-18-16 00:00:00", "7-25-16 00:00:00", "8-08-16 00:00:00", "8-22-16 00:00:00", "8-29-16 00:00:00", "9-5-16 00:00:00", "9-12-16 00:00:00", "9-19-16 00:00:00", "9-26-16 00:00:00", "10-3-16 00:00:00", "10-10-16 00:00:00", "10-17-16 00:00:00", "10-24-16 00:00:00", "10-31-16 00:00:00", "11-7-16 00:00:00", "11-14-16 00:00:00", "11-21-16 00:00:00", "1-23-17 00:00:00", "1-30-17 00:00:00", "2-06-17 00:00:00", "3-13-17 00:00:00", "3-27-17 00:00:00", "6-19-17 00:00:00", "6-26-17 00:00:00"]
start_date = [pd.to_datetime(d) for d in start_date]
end_date = pd.DatetimeIndex(start_date) + pd.DateOffset(7)
ndf = pd.DataFrame({'start':pd.to_datetime(start_date),'end':end_date}); ndf.head()

What I want is values from another data frame that fall within the weeks defined in ndf . My other dataframe looks something like this:

dates = ["4-17-16 04:00:00", "4-16-16 19:30:00", "4-16-16 19:00:00", "2-24-16 09:00:00", "3-21-16 02:00:00", "3-18-16 10:00:00", "3-24-16 05:00:00", "3-11-16 00:00:00"]
df = pd.DataFrame(
    {'timestamp': dates,
     'value': np.random.randint(1,25,size=(8,))})

Now I want to create a new data frame that sums all the values from df that fall between the dates in ndf . So I created this function:

def get_dates(x):
    # Select the df values between start and ending datetime. 
    n = df[(df['timestamp']>ndf['start'])&(df['timestamp']<ndf['end'])]
    # Return sum of values
    return n.values[0],n['value'].sum()

I also played around with this: n = df[(df['timestamp']>ndf['start'])&(df['timestamp']<ndf['end'])] . But I get the error: ValueError: Can only compare identically-labeled Series objects .

I'm looking for someone to help me clean up my function so that it works or provide insight on the error message above. Thanks!

Answer 1

For your specific case where start dates and end dates form one continuous time period, probably you would want to to use something like this:

def get_dates():
    # Select the df values between start and ending datetime. 
    n = df[(df['timestamp'] > ndf['start'].min()) & 
           (df['timestamp'] < ndf['end'].max())]
    # Return sum of values
    return n.values[0], n['value'].sum()

And your error says that you are trying to compare arrays of different lengths. 长度比较不同长度的数组。 Your ndf has 45 rows when df has 1000

Edit: I am not sure if there is a prettier solution for a discontinuous time period than to iterate over both dataframes:

def get_dates():
    count = 0
    for index, values_row in df.iterrows():
        for _, time_deltas_row in ndf.iterrows():
            if time_deltas_row['start'] < values_row['timestamp'] < time_deltas_row['end']:
                count += 1
                continue
    return count

Answer 2

Use resample when you want to group data by evenly-spaced time intervals.

df.set_index('timestamp').resample('w-mon', label='left').sum().reset_index()

Returns:

   timestamp  value
0 2016-02-22   22.0
1 2016-02-29    NaN
2 2016-03-07   13.0
3 2016-03-14   20.0
4 2016-03-21    9.0
5 2016-03-28    NaN
6 2016-04-04    NaN
7 2016-04-11   34.0

Sum weekly totals of values from one data frame based on dates in another data frame in python

Question

2 answers

solution1
2 2017-11-09 16:53:51

solution2
1 ACCPTED 2017-11-09 20:45:23

Sum weekly totals of values from one data frame based on dates in another data frame in python

Question

2 answers

solution1 2 2017-11-09 16:53:51

solution2 1 ACCPTED 2017-11-09 20:45:23

solution1
2 2017-11-09 16:53:51

solution2
1 ACCPTED 2017-11-09 20:45:23