简体   繁体   中英

Calculate 3-minutely mean and assign boolean to 1-minutely value exceeding threshold in pandas dataframe column

I've got a dataframe 'activity_level'. Here is the column I want to use:

activity_level['MP']
Date_and_time
2020-07-24 21:00:00    0.0
2020-07-24 21:01:00    0.0
2020-07-24 21:02:00    0.0
2020-07-24 21:03:00    0.0
2020-07-24 21:04:00    0.0
2020-07-24 21:05:00    0.0
2020-07-24 21:06:00    0.0
2020-07-24 21:07:00    0.0
2020-07-24 21:08:00    0.0
2020-07-24 21:09:00    0.0
2020-07-24 21:10:00    0.0
2020-07-24 21:11:00    0.0
2020-07-24 21:12:00    0.0
2020-07-24 21:13:00    0.0
2020-07-24 21:14:00    0.0
2020-07-24 21:15:00    0.0
2020-07-24 21:16:00    0.0
2020-07-24 21:17:00    0.0
2020-07-24 21:18:00    0.0
2020-07-24 21:19:00    0.0
2020-07-24 21:20:00    0.0
2020-07-24 21:21:00    0.0
2020-07-24 21:22:00    0.0
2020-07-24 21:23:00    0.0
2020-07-24 21:24:00    0.0
2020-07-24 21:25:00    0.0
2020-07-24 21:26:00    0.0
2020-07-24 21:27:00    0.0
2020-07-24 21:28:00    0.0
2020-07-24 21:29:00    0.0
2020-07-24 21:30:00    0.0
2020-07-24 21:31:00    0.0
2020-07-24 21:32:00    0.0
2020-07-24 21:33:00    0.0
2020-07-24 21:34:00    0.0
2020-07-24 21:35:00    0.0
2020-07-24 21:36:00    0.0
2020-07-24 21:37:00    0.0
2020-07-24 21:38:00    0.0
2020-07-24 21:39:00    0.0
2020-07-24 21:40:00    0.0
2020-07-24 21:41:00    0.0
2020-07-24 21:42:00    0.0
2020-07-24 21:43:00    0.0
2020-07-24 21:44:00    0.0
2020-07-24 21:45:00    0.0
2020-07-24 21:46:00    0.0
2020-07-24 21:47:00    0.0
2020-07-24 21:48:00    0.0
2020-07-24 21:49:00    0.0
2020-07-24 21:50:00    0.0
2020-07-24 21:51:00    0.0
2020-07-24 21:52:00    0.0
2020-07-24 21:53:00    0.0
2020-07-24 21:54:00    0.0
2020-07-24 21:55:00    0.0
2020-07-24 21:56:00    0.0
2020-07-24 21:57:00    0.0
2020-07-24 21:58:00    0.0
2020-07-24 21:59:00    0.0
Name: MP, dtype: float64

I want to calculate the 3-minutely mean and assign a zero to the 1-minutely value if the 3-minutely mean exceeds 15. So for the first 3 values in activity_level['MP'] the mean is 0. So now I want to assign a zero to the first 3 values in activity_level['MP']. I have created an empty column to fill in zeroes or ones in this column.

I've tried the following, but I can't get it to work right:

#create empty column
activity_level['walking_frame'] = ""
#calculate 3-minutely mean
        walking_activity = activity_level.resample('180s').mean()
#create linspaced vector to loop over
        vector = np.linspace(0,60,20,endpoint=False).tolist()
        vector = [ int(x) for x in vector ]
        activity_level2 = activity_level.copy()
#loop to fill in zeroes or ones in empty column
        for MP_id,MP in enumerate(walking_activity['MP']):
            if MP > 15:
                activity_level2['walking_frame'][vector[MP_id]:vector[MP_id+1]] == 1
            else:
                activity_level2['walking_frame'][vector[MP_id]:vector[MP_id+1]] == 0

Any help would be much appreciated!

So, in my understanding, you are looking for a rolling mean and a shifted result on it. Just for the future, be welcoming and provide a DF definition and an example that would actually show some results regarding your test. You have 60 lines that would never trigger your intended action. Hence, for the fun of it, I've used the fibonacci sequence as values.

This solution assumes, that your data will be minutly values as you have displayed in your example.

import pandas as pd

# create a test dataframe
test_df = pd.DataFrame({
    "Date_and_time" : ["2020-07-24 21:00:00", "2020-07-24 21:01:00 ", "2020-07-24 21:02:00",
        "2020-07-24 21:03:00", "2020-07-24 21:04:00", "2020-07-24 21:05:00", "2020-07-24 21:06:00",
        "2020-07-24 21:07:00", "2020-07-24 21:08:00", "2020-07-24 21:09:00"],
    "value" : [0.0, 1.0, 1.0, 2.0, 3.0, 5.0, 8.0, 13.0, 21.0, 34.0]
})

# cast the times as datetims
test_df = test_df.assign(Date_and_time = lambda x : pd.to_datetime(x.Date_and_time))

# this is all you need, above is just setup
res = (
    test_df
        .assign(
            # create a rolling mean
            rolling_mean = lambda x : x.value.rolling(3).mean(),
            # create an indicator ("alter"), if rolling mean 2 minutes 
            # later is greater than 15
            alert = lambda x : x.rolling_mean.shift(-2) >= 15)

)

print(res)

And the output looks like this (the rolling mean of minute 7 through 9 is 22.66 and therefore, you are alerted at minute 7):

        Date_and_time  value  rolling_mean  alert
0 2020-07-24 21:00:00    0.0           NaN  False
1 2020-07-24 21:01:00    1.0           NaN  False
2 2020-07-24 21:02:00    1.0      0.666667  False
3 2020-07-24 21:03:00    2.0      1.333333  False
4 2020-07-24 21:04:00    3.0      2.000000  False
5 2020-07-24 21:05:00    5.0      3.333333  False
6 2020-07-24 21:06:00    8.0      5.333333  False
7 2020-07-24 21:07:00   13.0      8.666667   True
8 2020-07-24 21:08:00   21.0     14.000000  False
9 2020-07-24 21:09:00   34.0     22.666667  False

Sidenote : if your data is not exactly minutly as in the example, you can set the date_and_time column as index and use the .rolling() function with a time-window. See pandas docs for rolling function .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM