How to generate values inside a time period in a python dataframe?

Question

I have some data inside a specific period range (period 0, 1, 2...) and I would like to create values that are inside the periods that will get the difference of the values and divide for the total of periods that will be set;

For instance:

import pandas as pd

data = [{'metric': '3f00d0b5', 'time':52.66, 'time_order': 0, 'variable': 'var1', 'value': 0.035},
        {'metric': '3f00d0b5', 'time':422.4, 'time_order': 1, 'variable': 'var1', 'value': 0.512},
        {'metric': '3f00d0b5', 'time':620.1, 'time_order': 2, 'variable': 'var1', 'value': 0.0},
        
        {'metric': '3f00d0b5', 'time':52.66, 'time_order': 0, 'variable': 'var2', 'value': 0.007},
        {'metric': '3f00d0b5', 'time':422.4, 'time_order': 1, 'variable': 'var2', 'value': 0.012},
        {'metric': '3f00d0b5', 'time':620.1, 'time_order': 2, 'variable': 'var2', 'value': 0.214},
            
        {'metric': '83e7fdd1', 'time':25.42, 'time_order': 0, 'variable': 'var1', 'value': 0.0},
        {'metric': '83e7fdd1', 'time':322.45, 'time_order': 1, 'variable': 'var1', 'value': 0.241},
        {'metric': '83e7fdd1', 'time':678.12, 'time_order': 2, 'variable': 'var1', 'value': 0.005},
        
        {'metric': '83e7fdd1', 'time':25.42, 'time_order': 0, 'variable': 'var2', 'value': 0.02},
        {'metric': '83e7fdd1', 'time':322.45, 'time_order': 1, 'variable': 'var2', 'value': 0.007},
        {'metric': '83e7fdd1', 'time':678.12, 'time_order': 2, 'variable': 'var2', 'value': 0.0}
]
    
df = pd.DataFrame.from_dict(data)

Based on the data above the final result I'm looking for is:

{'metric': '3f00d0b5',  'time':52.66, 'time_order': 0, 'variable': 'var1', 'value': 0.035},
{'metric': '3f00d0b5',  'time':52.66, 'time_order': 0.1, 'variable': 'var1', 'value': 0.083},
...
{'metric': '3f00d0b5',  'time':52.66, 'time_order': 0.9, 'variable': 'var1', 'value': 0.4643},
{'metric': '3f00d0b5',  'time':422.4, 'time_order': 1, 'variable': 'var1', 'value': 0.512},

There are a straightforward way to implement this in a pythonic way?

Thank you in advance, Leonardo

Answer 1

You can use groupby and a custom function to augment the data:

def data_augment(df):
    new_index = np.arange(df['time_order'].min(), df['time_order'].max()+0.1, 0.1)
    return (df.set_index('time_order')['value']
              .reindex(new_index).interpolate())
    
out = (df.groupby(['metric', 'variable']).apply(data_augment)
         .stack().rename('value').reset_index()[df.columns])

Output:

>>> out
      metric  time_order variable   value
0   3f00d0b5         0.0     var1  0.0350
1   3f00d0b5         0.1     var1  0.0827
2   3f00d0b5         0.2     var1  0.1304
3   3f00d0b5         0.3     var1  0.1781
4   3f00d0b5         0.4     var1  0.2258
..       ...         ...      ...     ...
79  83e7fdd1         1.6     var2  0.0028
80  83e7fdd1         1.7     var2  0.0021
81  83e7fdd1         1.8     var2  0.0014
82  83e7fdd1         1.9     var2  0.0007
83  83e7fdd1         2.0     var2  0.0000

[84 rows x 4 columns]

How to generate values inside a time period in a python dataframe?

Question

1 answers

solution1
1 ACCPTED 2023-01-30 20:42:49

How to generate values inside a time period in a python dataframe?

Question

1 answers

solution1 1 ACCPTED 2023-01-30 20:42:49

solution1
1 ACCPTED 2023-01-30 20:42:49