简体   繁体   中英

Address and smoothen noise in sensor data

I have sensors data as below wherein under Data Column, there are 6rows containing value 45 in between preceding and following rows containing value 50. The requirement is to clean this data and impute with 50 (prev value) in the new_data column. Moreover, the no of noise records (shown as 45 in table) might either vary in number or with level of rows.

Case 1 (sample data):-

Sl.no Timestamp Data New_data
1 1/1/2021 0:00:00 50 50
2 1/1/2021 0:15:00 50 50
3 1/1/2021 0:30:00 50 50
4 1/1/2021 0:45:00 50 50
5 1/1/2021 1:00:00 50 50
6 1/1/2021 1:15:00 50 50
7 1/1/2021 1:30:00 50 50
8 1/1/2021 1:45:00 50 50
9 1/1/2021 2:00:00 50 50
10 1/1/2021 2:15:00 50 50
11 1/1/2021 2:30:00 45 50
12 1/1/2021 2:45:00 45 50
13 1/1/2021 3:00:00 45 50
14 1/1/2021 3:15:00 45 50
15 1/1/2021 3:30:00 45 50
16 1/1/2021 3:45:00 45 50
17 1/1/2021 4:00:00 50 50
18 1/1/2021 4:15:00 50 50
19 1/1/2021 4:30:00 50 50
20 1/1/2021 4:45:00 50 50
21 1/1/2021 5:00:00 50 50
22 1/1/2021 5:15:00 50 50
23 1/1/2021 5:30:00 50 50

I am thinking of a need to group these data ordered by timestamp asc (like below) and then could have a condition in place where it will have to check group by group in large sample data and if group 1 is same as group 3, replace group 2 with group 1 values.

Sl.no Timestamp Data New_data group
1 1/1/2021 0:00:00 50 50 1
2 1/1/2021 0:15:00 50 50 1
3 1/1/2021 0:30:00 50 50 1
4 1/1/2021 0:45:00 50 50 1
5 1/1/2021 1:00:00 50 50 1
6 1/1/2021 1:15:00 50 50 1
7 1/1/2021 1:30:00 50 50 1
8 1/1/2021 1:45:00 50 50 1
9 1/1/2021 2:00:00 50 50 1
10 1/1/2021 2:15:00 50 50 1
11 1/1/2021 2:30:00 45 50 2
12 1/1/2021 2:45:00 45 50 2
13 1/1/2021 3:00:00 45 50 2
14 1/1/2021 3:15:00 45 50 2
15 1/1/2021 3:30:00 45 50 2
16 1/1/2021 3:45:00 45 50 2
17 1/1/2021 4:00:00 50 50 3
18 1/1/2021 4:15:00 50 50 3
19 1/1/2021 4:30:00 50 50 3
20 1/1/2021 4:45:00 50 50 3
21 1/1/2021 5:00:00 50 50 3
22 1/1/2021 5:15:00 50 50 3
23 1/1/2021 5:30:00 50 50 3

Moreover, there is also a need to add an exception like, if the next group is having similar pattern, not to change but to retain the data as it is. Ex below: If group 1 and group 3 are same, impute group 2 with group 1 value. But if group 2 and group 4 are same, do not change group 3, retain same data in New_data.

Case 2:-

Sl.no Timestamp Data New_data group
1 1/1/2021 0:00:00 50 50 1
2 1/1/2021 0:15:00 50 50 1
3 1/1/2021 0:30:00 50 50 1
4 1/1/2021 0:45:00 50 50 1
5 1/1/2021 1:00:00 50 50 1
6 1/1/2021 1:15:00 50 50 1
7 1/1/2021 1:30:00 50 50 1
8 1/1/2021 1:45:00 50 50 1
9 1/1/2021 2:00:00 50 50 1
10 1/1/2021 2:15:00 50 50 1
11 1/1/2021 2:30:00 45 50 2
12 1/1/2021 2:45:00 45 50 2
13 1/1/2021 3:00:00 45 50 2
14 1/1/2021 3:15:00 45 50 2
15 1/1/2021 3:30:00 45 50 2
16 1/1/2021 3:45:00 45 50 2
17 1/1/2021 4:00:00 50 50 3
18 1/1/2021 4:15:00 50 50 3
19 1/1/2021 4:30:00 50 50 3
20 1/1/2021 4:45:00 50 50 3
21 1/1/2021 5:00:00 50 50 3
22 1/1/2021 5:15:00 50 50 3
23 1/1/2021 5:30:00 50 50 3
24 1/1/2021 5:45:00 45 45 4
25 1/1/2021 6:00:00 45 45 4
26 1/1/2021 6:15:00 45 45 4
27 1/1/2021 6:30:00 45 45 4
28 1/1/2021 6:45:00 45 45 4
29 1/1/2021 7:00:00 45 45 4
30 1/1/2021 7:15:00 45 45 4
31 1/1/2021 7:30:00 45 45 4

Reaching out for help in coding in postgresql to address above scenario. Please feel free to suggest any alternative approaches to solve above problem.

The query below should answer the need.

  1. The first query identifies the rows which correspond to a change of data.

  2. The second query groups the rows between two successive changes of data and set up the corresponding range of timestamp

  3. The third query is a recursive query which calculates the new_data in an iterative way according to the timestamp order.

  4. The last query display the expected result.

     WITH RECURSIVE list As ( SELECT no, timestamp, lag(data) OVER w AS previous, data, lead(data) OVER w AS next, data IS DISTINCT FROM lag(data) OVER w AS first, data IS DISTINCT FROM lead(data) OVER w AS last FROM sensors WINDOW w AS (ORDER BY timestamp ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) ), range_list AS ( SELECT tsrange(timestamp, lead(timestamp) OVER w, '[]') AS range, previous, data, lead(next) OVER w AS next, first FROM list WHERE first OR last WINDOW w AS (ORDER BY timestamp ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) ), rec_list (range, previous, data, next, new_data, arr) AS ( SELECT range, previous, data, next, data, array[range] FROM range_list WHERE previous IS NULL UNION ALL SELECT c.range, p.data, c.data, c.next, CASE WHEN p.new_data IS NOT DISTINCT FROM c.next THEN p.data ELSE c.data END, p.arr || c.range FROM rec_list AS p INNER JOIN range_list AS c ON lower(c.range) = upper(p.range) + interval '15 minutes' WHERE NOT array[c.range] <@ p.arr AND first ) SELECT s.*, r.new_data FROM sensors AS s INNER JOIN rec_list AS r ON r.range @> s.timestamp ORDER BY timestamp

    see the test result in dbfiddle

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM