简体   繁体   中英

How to calculate difference in a pandas dataframe depending on column value and different datetimeindex?

I would like to calculate the max difference between n categories. Please note that the timestamp will differ in their milliseconds, but I only want to calculate the difference rowwise within a set time interval.

My data looks like this:

TimeStamp                   Category    Value
2020-12-07 07:44:08.679     A       7.40    
2020-12-07 07:44:08.682     B       8.18    
2020-12-07 07:44:08.685     C       6.95    
2020-12-07 07:44:11.271     A       249.30
2020-12-07 07:44:11.272     B       249.75
2020-12-07 07:44:11.273     C       250.58
2020-12-07 07:44:13.746     A       250.18
2020-12-07 07:44:13.747     B       250.78
2020-12-07 07:44:13.748     C       251.33
2020-12-07 07:44:16.225     A       249.87
2020-12-07 07:44:16.226     B       251.48
2020-12-07 07:44:16.227     C       251.43  

And I want to achieve something like this:

TimeStamp           Difference
2020-12-07 07:44:08     1,23
2020-12-07 07:44:11     1,28
2020-12-07 07:44:13     1,15
2020-12-07 07:44:16     1,61

But I am stuck where I could truncate or group the TimeStamps to their precise seconds.

I recreated your data frame with this:

df = pd.read_csv(io.StringIO(
"""
TimeStamp,Category,Value
2020-12-07 07:44:08.679,     A,       7.40    
2020-12-07 07:44:08.682,     B,       8.18    
2020-12-07 07:44:08.685,     C,       6.95    
2020-12-07 07:44:11.271,     A,       249.30
2020-12-07 07:44:11.272,     B,       249.75
2020-12-07 07:44:11.273,     C,       250.58
2020-12-07 07:44:13.746,     A,       250.18
2020-12-07 07:44:13.747,     B,       250.78
2020-12-07 07:44:13.748,     C,       251.33
2020-12-07 07:44:16.225,     A,       249.87
2020-12-07 07:44:16.226,     B,       251.48
2020-12-07 07:44:16.227,     C,       251.43  
"""
), sep=",")
df.TimeStamp = pd.to_datetime(df.TimeStamp)

I made some assumptions before tackling the task. First, I assume you don't actually use the Category variable so I drop that column.

df = df.drop("Category", axis=1)
``

I also remove the micro seconds as we are going to group on the seconds level. 

```python
df.TimeStamp = df.TimeStamp.apply(lambda x: x.replace(microsecond=0))

Then I use the groupby function on the TimeStamp column and use an aggregate function which subtracts the minimum value in the group by the maximum value as such:

df = df.groupby(df["TimeStamp"]).agg(lambda x: max(x) - min(x))

And we end up with this data frame:

                     Value
TimeStamp                 
2020-12-07 07:44:08   1.23
2020-12-07 07:44:11   1.28
2020-12-07 07:44:13   1.15
2020-12-07 07:44:16   1.61

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM