简体   繁体   中英

Grouping column data in Pandas Dataframes

I have a Panda data frame (df) with many columns. For the sake of simplicity, I am posting three columns with dummy data here.

Timestamp    Source    Length
0            1              5
1            1              5
2            1              5
3            2              5
4            2              5
5            3              5
6            1              5
7            3              5
8            2              5
9            1              5

Using Panda functions, First I set timestamp as index of the df.

index = pd.DatetimeIndex(data[data.columns[1]]*10**9) # Convert timestamp
df = df.set_index(index) # Set Timestamp as index

Next I can use groupby and pd.TimeGrouper functions to group the data into 5 seconds bins and compute cumulative length for each bin as following:

df_length = data[data.columns[5]].groupby(pd.TimeGrouper('5S')).sum()

So the df_length dataframe should look like:

Timestamp     Length
0             25
5             25

Now the problem is: "I want to get the same bins of 5 seconds, but ant to compute the cumulative length for each source (1,2 and 3) in separate columns in the following format:

Timestamp    1     2     3
0            15    10    0
5            10    5     10

I think I can use df.groupby with some conditions to get it. But confused and tired now :(

Appreciate solution using panda functions only.

You can add new column for groupby Source for MultiIndex DataFrame and then reshape by unstack last level of MultiIndex for columns:

print (df[df.columns[2]].groupby([pd.TimeGrouper('5S'), df['Source']]).sum())
Timestamp            Source
1970-01-01 00:00:00  1         15
                     2         10
1970-01-01 00:00:05  1         10
                     2          5
                     3         10
Name: Length, dtype: int64

df1 = df[df.columns[2]].groupby([pd.TimeGrouper('5S'), df['Source']])
                       .sum()
                       .unstack(fill_value=0)
print (df1)
Source                1   2   3
Timestamp                      
1970-01-01 00:00:00  15  10   0
1970-01-01 00:00:05  10   5  10

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM