简体   繁体   中英

How to generate values inside a time period in a python dataframe?

I have some data inside a specific period range (period 0, 1, 2...) and I would like to create values that are inside the periods that will get the difference of the values and divide for the total of periods that will be set;

For instance:

import pandas as pd

data = [{'metric': '3f00d0b5', 'time':52.66, 'time_order': 0, 'variable': 'var1', 'value': 0.035},
        {'metric': '3f00d0b5', 'time':422.4, 'time_order': 1, 'variable': 'var1', 'value': 0.512},
        {'metric': '3f00d0b5', 'time':620.1, 'time_order': 2, 'variable': 'var1', 'value': 0.0},
        
        {'metric': '3f00d0b5', 'time':52.66, 'time_order': 0, 'variable': 'var2', 'value': 0.007},
        {'metric': '3f00d0b5', 'time':422.4, 'time_order': 1, 'variable': 'var2', 'value': 0.012},
        {'metric': '3f00d0b5', 'time':620.1, 'time_order': 2, 'variable': 'var2', 'value': 0.214},
            
        {'metric': '83e7fdd1', 'time':25.42, 'time_order': 0, 'variable': 'var1', 'value': 0.0},
        {'metric': '83e7fdd1', 'time':322.45, 'time_order': 1, 'variable': 'var1', 'value': 0.241},
        {'metric': '83e7fdd1', 'time':678.12, 'time_order': 2, 'variable': 'var1', 'value': 0.005},
        
        {'metric': '83e7fdd1', 'time':25.42, 'time_order': 0, 'variable': 'var2', 'value': 0.02},
        {'metric': '83e7fdd1', 'time':322.45, 'time_order': 1, 'variable': 'var2', 'value': 0.007},
        {'metric': '83e7fdd1', 'time':678.12, 'time_order': 2, 'variable': 'var2', 'value': 0.0}
]
    
df = pd.DataFrame.from_dict(data)

Based on the data above the final result I'm looking for is:

{'metric': '3f00d0b5',  'time':52.66, 'time_order': 0, 'variable': 'var1', 'value': 0.035},
{'metric': '3f00d0b5',  'time':52.66, 'time_order': 0.1, 'variable': 'var1', 'value': 0.083},
...
{'metric': '3f00d0b5',  'time':52.66, 'time_order': 0.9, 'variable': 'var1', 'value': 0.4643},
{'metric': '3f00d0b5',  'time':422.4, 'time_order': 1, 'variable': 'var1', 'value': 0.512},

There are a straightforward way to implement this in a pythonic way?

Thank you in advance, Leonardo

You can use groupby and a custom function to augment the data:

def data_augment(df):
    new_index = np.arange(df['time_order'].min(), df['time_order'].max()+0.1, 0.1)
    return (df.set_index('time_order')['value']
              .reindex(new_index).interpolate())
    
out = (df.groupby(['metric', 'variable']).apply(data_augment)
         .stack().rename('value').reset_index()[df.columns])

Output:

>>> out
      metric  time_order variable   value
0   3f00d0b5         0.0     var1  0.0350
1   3f00d0b5         0.1     var1  0.0827
2   3f00d0b5         0.2     var1  0.1304
3   3f00d0b5         0.3     var1  0.1781
4   3f00d0b5         0.4     var1  0.2258
..       ...         ...      ...     ...
79  83e7fdd1         1.6     var2  0.0028
80  83e7fdd1         1.7     var2  0.0021
81  83e7fdd1         1.8     var2  0.0014
82  83e7fdd1         1.9     var2  0.0007
83  83e7fdd1         2.0     var2  0.0000

[84 rows x 4 columns]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM