[英]Convert column values for a group of data frame rows into a list in the column
For this question, let's take the following example.对于这个问题,我们来看下面的例子。 I have a dataframe which looks as follows (
df.head()
):我有一个 dataframe 如下所示(
df.head()
):
Unnamed: 0 PacketTime FrameLen FrameCapLen ... Speed Delay Loss Interval
0 1 0.056078 116 116 ... 25 0 0 0
1 2 0.056106 66 66 ... 25 0 0 0
2 3 2.058089 116 116 ... 25 0 0 2
3 4 2.058115 66 66 ... 25 0 0 2
4 5 4.060316 116 116 ... 25 0 0 4
[5 rows x 23 columns]
As you can see the groups are by the Interval
column.如您所见,这些组位于“
Interval
”列。 I know that pandas has a df.groupby(colname)
, but what I wish to do is to group the interval rows, such that the column values are listed together.我知道 pandas 有一个
df.groupby(colname)
,但我想做的是对间隔行进行分组,以便列值一起列出。 This would give an example output as follows:这将给出一个示例 output 如下:
Unnamed: 0 PacketTime FrameLen FrameCapLen ... Speed Delay Loss Interval
0 1 0.000028 116,66 116,66 ... 25,25 0,0 0,0 0
1 2 0.000026 116,66 116,66 ... 25,25 0,0 0,0 2
...
[5 rows x 23 columns]
As you can see the desired end result is to have the columns grouped into a list for the interval groups, and the packet time is combined such that the value is max(PacketTime)-min(PacketTime)
for each interval group.正如您所看到的,所需的最终结果是将列分组到间隔组的列表中,并且组合数据包时间,使得每个间隔组的值为
max(PacketTime)-min(PacketTime)
。
These are two separate tasks.这是两个独立的任务。 For both, let's use the fact that a groupby operation which does the following process :
对于两者,让我们使用执行以下过程的 groupby 操作这一事实:
Split a single data frame into multiple data frames based on a single column.基于单个列将单个数据框拆分为多个数据框。 Apply operation to each data frame.
对每个数据框应用操作。 Stich the resulting data frames together.
将生成的数据帧拼接在一起。
First job:第一份工作:
Have a single line per interval for all columns other then PacketTime - where each value is a list of the two values.除 PacketTime 之外的所有列的每个间隔都有一行 - 其中每个值都是两个值的列表。
We want to stitch stuff to a list.我们想把东西缝合到一个列表中。 So let's use
series.to_list()
for that.所以让我们使用
series.to_list()
。 For a reason unknown to me, calling df.apply(lambda s: s.to_list() )
won't work.由于我不知道的原因,调用
df.apply(lambda s: s.to_list() )
将不起作用。 Pandas automatically convert the list back to normal columns - however calling this on rows return what we want: a series of lists. Pandas 自动将列表转换回普通列 - 但是在行上调用它会返回我们想要的:一系列列表。 Thus we will convert columns to rows, apply to_list on rows (which are former columns).
因此,我们将列转换为行,将 to_list 应用于行(以前的列)。
Example例子
df.T.apply(lambda series: series.to_list(), axis='columns')
results in:结果是:
PacketTime [0.056078, 0.056106, 2.058089, 2.058115, 4.060...
FrameLen [116.0, 66.0, 116.0, 66.0, 116.0]
FrameCapLen [116.0, 66.0, 116.0, 66.0, 116.0]
Unnamed: 3 [nan, nan, nan, nan, nan]
Speed [25.0, 25.0, 25.0, 25.0, 25.0]
Delay [0.0, 0.0, 0.0, 0.0, 0.0]
Loss [0.0, 0.0, 0.0, 0.0, 0.0]
Interval [0.0, 0.0, 2.0, 2.0, 4.0]
This is exactly what we want for each Interval.这正是我们想要的每个区间。 So let's define it as a function and apply it to each interval then, right?!
因此,让我们将其定义为 function 并将其应用于每个间隔,对吧?!
import pandas as pd
df = pd.read_excel('example.xlsx')
def to_list(df):
return df.T.apply(lambda x: x.to_list(), axis='columns')
df_other = df.groupby('Interval')\
.apply(to_list)\
.drop(columns='PacketTime')
Second job:第二份工作:
For calculating the duration, all we need is a function that takes the minimum of the time and a maximum of the time and deduces them to have the time length:为了计算持续时间,我们只需要一个 function,它取最短时间和最长时间并推导出它们的时间长度:
def min_max(s):
return s.max()-s.min()
Now we just apply it and join the two dfs together:现在我们只需应用它并将两个 dfs 连接在一起:
s_Interval = df.groupby('Interval')['PacketTime']\
.apply(min_max)
final_df = pd.concat([df_other,s_Interval], axis= 'columns')
We end up with:我们最终得到:
print(final_df.to_markdown())
| Interval | FrameLen | FrameCapLen | Unnamed: 3 | Speed | Delay | Loss | Interval | PacketTime |
|-----------:|:--------------|:--------------|:-------------|:-------------|:-----------|:-----------|:-----------|-------------:|
| 0 | [116.0, 66.0] | [116.0, 66.0] | [nan, nan] | [25.0, 25.0] | [0.0, 0.0] | [0.0, 0.0] | [0.0, 0.0] | 2.8e-05 |
| 2 | [116.0, 66.0] | [116.0, 66.0] | [nan, nan] | [25.0, 25.0] | [0.0, 0.0] | [0.0, 0.0] | [2.0, 2.0] | 2.6e-05 |
| 4 | [116.0] | [116.0] | [nan] | [25.0] | [0.0] | [0.0] | [4.0] | 0 |
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.