I have the following pandas dataframe set up to import from a csv:
df = pd.read_csv('file_path',
parse_dates={'timestamp': ['Date','Time']},
index_col='timestamp',
usecols=['Date', 'Time', 'X'],)
So it ends up having a datetime as the index and an int64 object 'X' for the value.
My data looks like this with two columns:
X
timestamp
2015-08-25 16:52:10 95
2015-08-25 16:52:12 84
2015-08-25 16:52:14 86
2015-08-25 16:52:16 84
2015-08-25 16:52:18 85
2015-08-25 16:52:20 86
2015-08-25 16:52:22 84
2015-08-25 16:52:24 95
2015-08-25 16:52:28 95
2015-08-25 16:52:48 80
2015-08-25 16:52:50 85
2015-08-25 16:52:52 85
2015-08-25 16:52:54 84
2015-08-25 16:52:56 85
2015-08-25 16:52:58 86
2015-08-25 16:53:00 85
2015-08-25 16:53:02 85
2015-08-25 16:53:04 85
2015-08-25 16:53:06 86
2015-08-25 16:53:08 85
2015-08-25 16:53:10 85
The interval isn't always consistent, however. Sometimes I have data points that are more than two seconds apart (ie 16:52:28-16:52:48).
My desired values are X = [84, 86] but ONLY IF they occur for at least 10 continuous seconds.
So in my dataframe, I would want python to only return a count of 2 for 16:52:12-22 and 16:52:50-16:53:10.
How do I tell python to not count 16:52:50-16:53:10 as 2? I can code for a specific time interval, but how do I translate "at least Y continuous seconds" into python?
Thanks in advance.
EDIT: To clarify, my preferred output would be a count of how many times Event Y occurs within a sample set. Event Y occurs when X has a value for at least 10 consecutive seconds. So for example, if X is at 84-86 for at least 10 consecutive seconds, then I would want that to be a count of 1.
I'm not sure of exactly what you want to do, but I give you an answer at least to help to clarify the expectations.
# Test data
df = pd.DataFrame([('2015-08-25 16:52:10', 95),
('2015-08-25 16:52:12', 84),
('2015-08-25 16:52:14', 86),
('2015-08-25 16:52:16', 84),
('2015-08-25 16:52:18', 85),
('2015-08-25 16:52:20', 86),
('2015-08-25 16:52:22', 84),
('2015-08-25 16:52:24', 95),
('2015-08-25 16:52:28', 95),
('2015-08-25 16:52:48', 80),
('2015-08-25 16:52:50', 85),
('2015-08-25 16:52:52', 85),
('2015-08-25 16:52:54', 84),
('2015-08-25 16:52:56', 85),
('2015-08-25 16:52:58', 86),
('2015-08-25 16:53:00', 85),
('2015-08-25 16:53:02', 85),
('2015-08-25 16:53:04', 85),
('2015-08-25 16:53:06', 86),
('2015-08-25 16:53:08', 85),
('2015-08-25 16:53:10', 85)],
columns=['timestamp', 'x'])
df['timestamp'] = pd.to_datetime(df['timestamp'])
df = df.set_index('timestamp')
# Define a period column to indicate the period when the values occur
new = df.groupby(pd.TimeGrouper('10s'),as_index=False).apply(lambda x: x['x'])
df['period'] = new.index.get_level_values(0)
# Group by period and value and count the number of values to see the distinct values and how many time they occur by period
df = df.reset_index()
grouped = df.groupby(['period','x']).count()
print(grouped.head(10))
timestamp
period x
0 84 2
85 1
86 1
95 1
1 84 1
86 1
95 2
3 80 1
4 84 1
85 3
Given your example:
>>> df
timestamp x
0 2015-08-25 16:52:10 95
1 2015-08-25 16:52:12 84
2 2015-08-25 16:52:14 86
3 2015-08-25 16:52:16 84
4 2015-08-25 16:52:18 85
5 2015-08-25 16:52:20 86
6 2015-08-25 16:52:22 84
7 2015-08-25 16:52:24 95
8 2015-08-25 16:52:28 95
9 2015-08-25 16:52:48 80
10 2015-08-25 16:52:50 85
11 2015-08-25 16:52:52 85
12 2015-08-25 16:52:54 84
13 2015-08-25 16:52:56 85
14 2015-08-25 16:52:58 86
15 2015-08-25 16:53:00 85
16 2015-08-25 16:53:02 85
17 2015-08-25 16:53:04 85
18 2015-08-25 16:53:06 86
19 2015-08-25 16:53:08 85
20 2015-08-25 16:53:10 85
First, let's get a new column with the interval between two time stamps:
>>> tl=df['timestamp']
>>> df['interval']=[(tl[i+1]-tl[i]).total_seconds() for i, _ in enumerate(tl[:-1])]+[0]
>>> df
timestamp x interval
0 2015-08-25 16:52:10 95 2
1 2015-08-25 16:52:12 84 2
2 2015-08-25 16:52:14 86 2
3 2015-08-25 16:52:16 84 2
4 2015-08-25 16:52:18 85 2
5 2015-08-25 16:52:20 86 2
6 2015-08-25 16:52:22 84 2
7 2015-08-25 16:52:24 95 4
8 2015-08-25 16:52:28 95 20
9 2015-08-25 16:52:48 80 2
10 2015-08-25 16:52:50 85 2
11 2015-08-25 16:52:52 85 2
12 2015-08-25 16:52:54 84 2
13 2015-08-25 16:52:56 85 2
14 2015-08-25 16:52:58 86 2
15 2015-08-25 16:53:00 85 2
16 2015-08-25 16:53:02 85 2
17 2015-08-25 16:53:04 85 2
18 2015-08-25 16:53:06 86 2
19 2015-08-25 16:53:08 85 2
20 2015-08-25 16:53:10 85 0
Now, use Python's groupby to get each interval span:
fmt='{} sec interval between {} and {} every {} seconds\n\tx={}, count={}\n'
for k, l in groupby(df.iterrows(), key=lambda row: row[1]['interval']):
li=list(l)
t2, t1=li[-1][1]['timestamp'], li[0][1]['timestamp']
ti=(t2-t1).total_seconds()
if ti>=10.0:
data=[e[1]['x'] for e in li]
print fmt.format(ti, t1, t2, k, data, Counter(data))
Prints:
12.0 sec interval between 2015-08-25 16:52:10 and 2015-08-25 16:52:22 every 2.0 seconds
x=[95, 84, 86, 84, 85, 86, 84], count=Counter({84: 3, 86: 2, 85: 1, 95: 1})
20.0 sec interval between 2015-08-25 16:52:48 and 2015-08-25 16:53:08 every 2.0 seconds
x=[80, 85, 85, 84, 85, 86, 85, 85, 85, 86, 85], count=Counter({85: 7, 86: 2, 80: 1, 84: 1})
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.