So I have the following pandas dataframe, sorted by Timestamp ascending:
Timestamp,Point,Value
2019-09-01,A,1
2019-09-01,B,2
2019-09-02,A,1
2019-09-02,B,2
2019-09-03,A,3
2019-09-03,B,4
2019-09-04,A,3
2019-09-04,B,4
2019-09-05,A,1
2019-09-05,B,2
This dataframe contains a reading of the value of different "points" at different moments in time. In this example, A and B have readings every 1 day, but some of those values are the same as the previous reading.
I need to apply a transformation that will only leave rows whose Value column has changed from the previous reading for the same point.
|Timestamp |Point|Value|
|----------|-----|-----|
|2019-09-01|A |1 | // A = 1
|2019-09-01|B |2 | // B = 2
|2019-09-02|A |1 | // A unchanged, should be removed
|2019-09-02|B |2 | // B unchanged, should be removed
|2019-09-03|A |3 | // A = 3
|2019-09-03|B |4 | // B = 4
|2019-09-04|A |3 | // A unchanged, should be removed
|2019-09-04|B |4 | // B unchanged, should be removed
|2019-09-05|A |1 | // A = 1
|2019-09-05|B |2 | // B = 2
In this simplified example, I'd want to get a dataframe like the following. Only including values that are different from the previous reading for the same point.
|Timestamp |Point|Value|
|----------|-----|-----|
|2019-09-01|A |1 |
|2019-09-01|B |2 |
|2019-09-03|A |3 |
|2019-09-03|B |4 |
|2019-09-05|A |1 |
|2019-09-05|B |2 |
You can reshape the dataframe to have unique Timestamps as row and Points as columns , then check if there is a change else assign nan and stack()
:
m = df.set_index(['Timestamp','Point']).unstack().where(lambda x:
x.ne(x.shift())).stack().reset_index()
Or breaking into 2 lines:
m = df.set_index(['Timestamp','Point']).unstack()
m = m.where(m.ne(m.shift())).stack().reset_index()
print(m)
Timestamp Point Value
0 2019-09-01 A 1.0
1 2019-09-01 B 2.0
2 2019-09-03 A 3.0
3 2019-09-03 B 4.0
4 2019-09-05 A 1.0
5 2019-09-05 B 2.0
You can try boolean indexing, first sort by Timestamp
, groupby Point
and check that diff
(difference between 2 consecutive rows) of Value
is not equal to 0
:
df[df.sort_values('Timestamp').groupby('Point')['Value'].diff().ne(0)]
[out]
Timestamp Point Value
0 2019-09-01 A 1
1 2019-09-01 B 2
4 2019-09-03 A 3
5 2019-09-03 B 4
8 2019-09-05 A 1
9 2019-09-05 B 2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.