简体   繁体   中英

Applying max min and last index within a pandas groupby function Python

The code down below separates the data in months with the month_changes . The Values and Val_dates are correlated, Val_dates are supposed to be the matching dates for the Values indexes.

So [100,'2015-11-01 01:03:00'],[123, '2015-11-08 12:56:00']...... Each row in the multidimensional array is suppose to represent a single month so the first row [100,123,135.3,139.05,156.08,163.88,173.72] is for the November of 2015 and the sixth row [163,173.12] is for February of 2016 etc.

It outputs the last index of each month so the output for the first row would be 123 . The output of the function till it reaches the agg. is array([list([100.0, 123.0, 135.3, 139.05, 156.08]), list([0]), list([0]), list([163.88, 173.72]), 0], dtype=object) .

How can I add a code to the already existing code so that it outputs the max min and last index of each array. Resulting the Expected output below.

import numpy as np
import pandas as pd

Values = np.array(
    [
        [100, 123, 135.3, 139.05, 156.08, 163.88, 173.72],
        [100, 123, 135.3, 139.05, 156.08, 163.88, 173.72],
        [100, 123, 135.3, 139.05, 156.08, 163.88, 173.72],
        [100, 123, 135.3, 139.05, 156.08, 163.88, 173.72],
        [100, 123, 135.3, 139.05, 156.08, 163.88, 173.72],
        [100, 123, 135.3, 139.05, 156.08, 163.88, 173.72],
        [100, 123, 135.3, 139.05, 156.08, 163.88, 173.72],
    ]
)

Values = np.array([arr[i:] for i, arr in enumerate(Values.tolist())])

Val_dates = [
    "2015-11-01 01:03:00",
    "2015-11-08 12:56:00",
    "2015-11-11 02:30:00",
    "2015-11-14 04:23:00",
    "2015-11-14 05:23:00",
    "2016-02-11 02:00:00",
    "2016-02-15 15:00:00",
]

df = pd.DataFrame({"dt": Val_dates, "val": Values}).astype({"dt": "datetime64"})
idx = pd.date_range("2015-11-01 00:00:00", "2016-03-01 00:00:00", freq="MS")

display(
    df.groupby(pd.Grouper(freq="MS", key="dt"))["val"]
    .apply(lambda x: x.head(1).squeeze()[: len(x)] if len(x) else [0])
    .reindex(idx, fill_value=0)
    .to_numpy()
    .agg(
        # Get max of the duration column for each group
        max_duration=(lambda x: np.max(x)),
        # Get min of the duration column for each group
        min_duration=(lambda x: np.min(x)),
        # Last indexes
        last_index=(lambda x: x[len(x) - 1]),
    )
)

Expected Output:

在此处输入图片说明

So, given what the last functional part of your code outputs:

import numpy as np
import pandas as pd
...
arr = (
    df.groupby(pd.Grouper(freq="MS", key="dt"))["val"]
    .apply(lambda x: x.head(1).squeeze()[: len(x)] if len(x) else [0])
    .reindex(idx, fill_value=0)
    .to_numpy()
)
print(arr)
# Outputs a list of lists and one integer
[[100.0, 123.0, 135.3, 139.05, 156.08], [0], [0], [163.88, 173.72], 0]

You could try this:

new_df = pd.DataFrame({"dates": idx})

for i, item in enumerate(list(arr)):
    if not isinstance(item, list):
        item = [item]
    new_df.loc[i, "min"] = min(item)
    new_df.loc[i, "max"] = max(item)
    new_df.loc[i, "lastindex"] = item[len(item) - 1]

Then:

print(new_df)
# Outputs
       dates     min     max  lastindex
0 2015-11-01  100.00  156.08     156.08
1 2015-12-01    0.00    0.00       0.00
2 2016-01-01    0.00    0.00       0.00
3 2016-02-01  163.88  173.72     173.72
4 2016-03-01    0.00    0.00       0.00

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM