I am no data scientist. I do know python and I currently have to manage time series data that is coming in at a regular interval. Much of this data is all zero's or values that are the same for a long time, and to save memory I'd like to filter them out. Is there some standard method for this (which I am obviously unaware of) or should I implement my own algorithm?
What I want to achieve is the following:
interval value result
(summed)
1 0 0
2 0 # removed
3 0 0
4 1 1
5 2 2
6 2 # removed
7 2 # removed
8 2 2
9 0 0
10 0 0
Any help appreciated !
You could use pandas query on dataframes to achieve this:
import pandas as pd
matrix = [[1,0, 0],
[2, 0, 0],
[3, 0, 0],
[4, 1, 1],
[5, 2, 2],
[6, 2, 0],
[7, 2, 0],
[8, 2, 2],
[9, 0, 0],
[10,0, 0]]
df = pd.DataFrame(matrix, columns=list('abc'))
print(df.query("c != 0"))
There is no quick function call to do what you need. The following is one way
import pandas as pd
df = pd.DataFrame({'interval':[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'value':[0, 0, 0, 1, 2, 2, 2, 2, 0, 0]}) # example dataframe
df['group'] = df['value'].ne(df['value'].shift()).cumsum() # column that increments every time the value changes
df['key'] = 1 # create column of ones
df['key'] = df.groupby('group')['key'].transform('cumsum') # get the cumulative sum
df['key'] = df.groupby('group')['key'].transform(lambda x: x.isin( [x.min(), x.max()])) # check which key is minimum and which is maximum by group
df = df[df['key']==True].drop(columns=['group', 'key']) # keep only relevant cases
df
Here is the code:
l = [0, 0, 0, 1, 2, 2, 2, 2, 0, 0]
for (i, ll) in enumerate(l):
if i != 0 and ll == l[i-1] and i<len(l)-1 and ll == l[i+1]:
continue
print(i+1, ll)
It produces what you want. You haven't specified format of your input data, so I assumed they're in a list. The conditions ll == l[i-1]
and ll == l[i+1]
are key to skipping the repeated values.
Thanks all. Looking at the answers I guess I can conclude I'll need to roll my own. I'll be using your input as inspiration. Thanks again !
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.