简体   繁体   中英

Convert column values for a group of data frame rows into a list in the column

For this question, let's take the following example. I have a dataframe which looks as follows ( df.head() ):

   Unnamed: 0  PacketTime  FrameLen  FrameCapLen  ...  Speed  Delay  Loss  Interval
0           1    0.056078       116          116  ...     25      0     0         0
1           2    0.056106        66           66  ...     25      0     0         0
2           3    2.058089       116          116  ...     25      0     0         2
3           4    2.058115        66           66  ...     25      0     0         2
4           5    4.060316       116          116  ...     25      0     0         4

[5 rows x 23 columns]

As you can see the groups are by the Interval column. I know that pandas has a df.groupby(colname) , but what I wish to do is to group the interval rows, such that the column values are listed together. This would give an example output as follows:

   Unnamed: 0  PacketTime  FrameLen  FrameCapLen  ...  Speed  Delay  Loss  Interval
0           1    0.000028       116,66          116,66  ...     25,25      0,0     0,0         0
1           2    0.000026        116,66           116,66  ...     25,25      0,0     0,0         2
...

[5 rows x 23 columns]

As you can see the desired end result is to have the columns grouped into a list for the interval groups, and the packet time is combined such that the value is max(PacketTime)-min(PacketTime) for each interval group.

These are two separate tasks. For both, let's use the fact that a groupby operation which does the following process :

Split a single data frame into multiple data frames based on a single column. Apply operation to each data frame. Stich the resulting data frames together.

First job:

Have a single line per interval for all columns other then PacketTime - where each value is a list of the two values.

We want to stitch stuff to a list. So let's use series.to_list() for that. For a reason unknown to me, calling df.apply(lambda s: s.to_list() ) won't work. Pandas automatically convert the list back to normal columns - however calling this on rows return what we want: a series of lists. Thus we will convert columns to rows, apply to_list on rows (which are former columns).

Example

df.T.apply(lambda series: series.to_list(), axis='columns')

results in:

PacketTime     [0.056078, 0.056106, 2.058089, 2.058115, 4.060...
FrameLen                       [116.0, 66.0, 116.0, 66.0, 116.0]
FrameCapLen                    [116.0, 66.0, 116.0, 66.0, 116.0]
Unnamed: 3                             [nan, nan, nan, nan, nan]
Speed                             [25.0, 25.0, 25.0, 25.0, 25.0]
Delay                                  [0.0, 0.0, 0.0, 0.0, 0.0]
Loss                                   [0.0, 0.0, 0.0, 0.0, 0.0]
Interval                               [0.0, 0.0, 2.0, 2.0, 4.0]

This is exactly what we want for each Interval. So let's define it as a function and apply it to each interval then, right?!


import pandas as pd

df = pd.read_excel('example.xlsx')


def to_list(df):
    return df.T.apply(lambda x: x.to_list(), axis='columns')


df_other = df.groupby('Interval')\
            .apply(to_list)\
            .drop(columns='PacketTime')

Second job:

For calculating the duration, all we need is a function that takes the minimum of the time and a maximum of the time and deduces them to have the time length:

     
def min_max(s):
    return s.max()-s.min()

Now we just apply it and join the two dfs together:

s_Interval = df.groupby('Interval')['PacketTime']\
            .apply(min_max)
final_df = pd.concat([df_other,s_Interval], axis= 'columns')

We end up with:


print(final_df.to_markdown())
|   Interval | FrameLen      | FrameCapLen   | Unnamed: 3   | Speed        | Delay      | Loss       | Interval   |   PacketTime |
|-----------:|:--------------|:--------------|:-------------|:-------------|:-----------|:-----------|:-----------|-------------:|
|          0 | [116.0, 66.0] | [116.0, 66.0] | [nan, nan]   | [25.0, 25.0] | [0.0, 0.0] | [0.0, 0.0] | [0.0, 0.0] |      2.8e-05 |
|          2 | [116.0, 66.0] | [116.0, 66.0] | [nan, nan]   | [25.0, 25.0] | [0.0, 0.0] | [0.0, 0.0] | [2.0, 2.0] |      2.6e-05 |
|          4 | [116.0]       | [116.0]       | [nan]        | [25.0]       | [0.0]      | [0.0]      | [4.0]      |      0       |




The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM