通过对现有列进行分组和执行操作，在 pandas df 中创建新列

Question

I have a dataframe that looks like this (Minimal Reproducible Example)我有一个看起来像这样的数据框（Minimal Reproducible Example）

thermometers = ['T-10000_0001', 'T-10000_0002','T-10000_0003', 'T-10000_0004', 
                'T-10001_0001', 'T-10001_0002', 'T-10001_0003', 'T-10001_0004', 
                'T-10002_0001', 'T-10002_0003', 'T-10002_0003', 'T-10002_0004']

temperatures = [15.1, 14.9, 12.7, 10.8,
               19.8, 18.3, 17.7, 18.1,
               20.0, 16.4, 17.6, 19.3]

df_set = {'thermometers': thermometers,
         'Temperatures': temperatures}

df = pd.DataFrame(df_set)

Index指数	Thermometer温度计	Temperature温度
0 0	T-10000_0001 T-10000_0001	14.9 14.9
1 1	T-10000_0002 T-10000_0002	12.7 12.7
2 2	T-10000_0003 T-10000_0003	12.7 12.7
3 3	T-10000_0004 T-10000_0004	10.8 10.8
4 4	T-10001_0001 T-10001_0001	19.8 19.8
5 5	T-10001_0002 T-10001_0002	18.3 18.3
6 6	T-10001_0003 T-10001_0003	17.7 17.7
7 7	T-10001_0004 T-10001_0004	18.1 18.1
8 8	T-10002_0001 T-10002_0001	20.0 20.0
9 9	T-10002_0002 T-10002_0002	16.4 16.4
10 10	T-10002_0003 T-10002_0003	17.6 17.6
11 11	T-10002_0004 T-10002_0004	19.3 19.3

I am trying to group the thermometers (ie 'T-10000', 'T-10001', 'T-10002'), and create new columns with the min, max and average of each thermometer reading.我正在尝试对温度计进行分组（即'T-10000'、'T-10001'、'T-10002'），并创建具有每个温度计读数的最小值、最大值和平均值的新列。 So my final data frame would look like this所以我的最终数据框看起来像这样

Index指数	Thermometer温度计	min_temp min_temp	average_temp平均温度	max_temp最大温度
0 0	T-10000 T-10000	10.8 10.8	12.8 12.8	14.9 14.9
1 1	T-10001 T-10001	17.7 17.7	18.5 18.5	19.8 19.8
2 2	T-10002 T-10002	16.4 16.4	18.3 18.3	20.0 20.0

I tried creating a separate function which I think requires regular expression, but I'm unable to figure out how to go about it.我尝试创建一个我认为需要正则表达式的单独函数，但我无法弄清楚如何去做。 Any help will be much appreciated.任何帮助都感激不尽。

Answer 1

Use groupby by splitting with your delimiter _ .通过使用分隔符_拆分来使用groupby 。 Then, just aggregate with whatever functions you need.然后，只需聚合您需要的任何功能。

>>> df.groupby(df['thermometers']\
               .str.split('_').  \
               .str.get(0)).agg(['min', 'mean', 'max'])

                      min    mean   max
thermometers                           
T-10000              10.8  13.375  15.1
T-10001              17.7  18.475  19.8
T-10002              16.4  18.325  20.0

Answer 2

Another approach with str.extract to avoid the call to str.get :另一种使用str.extract的方法来避免调用str.get ：

(df['Temperatures']
 .groupby(df['thermometers'].str.extract('(^[^_]+)', expand=False))
 .agg(['min', 'mean'])
 )

Output:输出：

               min    mean
thermometers              
T-10000       10.8  13.375
T-10001       17.7  18.475
T-10002       16.4  18.325

通过对现有列进行分组和执行操作，在 pandas df 中创建新列

问题描述

2 个解决方案

解决方案1
2 已采纳 2022-06-05 01:23:59

解决方案2
0 2022-06-05 04:31:57

通过对现有列进行分组和执行操作，在 pandas df 中创建新列

问题描述

2 个解决方案

解决方案1 2 已采纳 2022-06-05 01:23:59

解决方案2 0 2022-06-05 04:31:57

解决方案1
2 已采纳 2022-06-05 01:23:59

解决方案2
0 2022-06-05 04:31:57