I have a Panda data frame (df) with many columns. For the sake of simplicity, I am posting three columns with dummy data here.
Timestamp Source Length
0 1 5
1 1 5
2 1 5
3 2 5
4 2 5
5 3 5
6 1 5
7 3 5
8 2 5
9 1 5
Using Panda functions, First I set timestamp as index of the df.
index = pd.DatetimeIndex(data[data.columns[1]]*10**9) # Convert timestamp
df = df.set_index(index) # Set Timestamp as index
Next I can use groupby and pd.TimeGrouper functions to group the data into 5 seconds bins and compute cumulative length for each bin as following:
df_length = data[data.columns[5]].groupby(pd.TimeGrouper('5S')).sum()
So the df_length dataframe should look like:
Timestamp Length
0 25
5 25
Now the problem is: "I want to get the same bins of 5 seconds, but ant to compute the cumulative length for each source (1,2 and 3) in separate columns in the following format:
Timestamp 1 2 3
0 15 10 0
5 10 5 10
I think I can use df.groupby with some conditions to get it. But confused and tired now :(
Appreciate solution using panda functions only.
You can add new column for groupby Source
for MultiIndex DataFrame
and then reshape by unstack
last level of MultiIndex
for columns:
print (df[df.columns[2]].groupby([pd.TimeGrouper('5S'), df['Source']]).sum())
Timestamp Source
1970-01-01 00:00:00 1 15
2 10
1970-01-01 00:00:05 1 10
2 5
3 10
Name: Length, dtype: int64
df1 = df[df.columns[2]].groupby([pd.TimeGrouper('5S'), df['Source']])
.sum()
.unstack(fill_value=0)
print (df1)
Source 1 2 3
Timestamp
1970-01-01 00:00:00 15 10 0
1970-01-01 00:00:05 10 5 10
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.