[英]Create new columns in pandas df by grouping and performing operations on an existing column
I have a dataframe that looks like this (Minimal Reproducible Example)我有一个看起来像这样的数据框(Minimal Reproducible Example)
thermometers = ['T-10000_0001', 'T-10000_0002','T-10000_0003', 'T-10000_0004',
'T-10001_0001', 'T-10001_0002', 'T-10001_0003', 'T-10001_0004',
'T-10002_0001', 'T-10002_0003', 'T-10002_0003', 'T-10002_0004']
temperatures = [15.1, 14.9, 12.7, 10.8,
19.8, 18.3, 17.7, 18.1,
20.0, 16.4, 17.6, 19.3]
df_set = {'thermometers': thermometers,
'Temperatures': temperatures}
df = pd.DataFrame(df_set)
Index指数 | Thermometer温度计 | Temperature温度 |
---|---|---|
0 0 | T-10000_0001 T-10000_0001 | 14.9 14.9 |
1 1 | T-10000_0002 T-10000_0002 | 12.7 12.7 |
2 2 | T-10000_0003 T-10000_0003 | 12.7 12.7 |
3 3 | T-10000_0004 T-10000_0004 | 10.8 10.8 |
4 4 | T-10001_0001 T-10001_0001 | 19.8 19.8 |
5 5 | T-10001_0002 T-10001_0002 | 18.3 18.3 |
6 6 | T-10001_0003 T-10001_0003 | 17.7 17.7 |
7 7 | T-10001_0004 T-10001_0004 | 18.1 18.1 |
8 8 | T-10002_0001 T-10002_0001 | 20.0 20.0 |
9 9 | T-10002_0002 T-10002_0002 | 16.4 16.4 |
10 10 | T-10002_0003 T-10002_0003 | 17.6 17.6 |
11 11 | T-10002_0004 T-10002_0004 | 19.3 19.3 |
I am trying to group the thermometers (ie 'T-10000', 'T-10001', 'T-10002'), and create new columns with the min, max and average of each thermometer reading.我正在尝试对温度计进行分组(即'T-10000'、'T-10001'、'T-10002'),并创建具有每个温度计读数的最小值、最大值和平均值的新列。 So my final data frame would look like this所以我的最终数据框看起来像这样
Index指数 | Thermometer温度计 | min_temp min_temp | average_temp平均温度 | max_temp最大温度 |
---|---|---|---|---|
0 0 | T-10000 T-10000 | 10.8 10.8 | 12.8 12.8 | 14.9 14.9 |
1 1 | T-10001 T-10001 | 17.7 17.7 | 18.5 18.5 | 19.8 19.8 |
2 2 | T-10002 T-10002 | 16.4 16.4 | 18.3 18.3 | 20.0 20.0 |
I tried creating a separate function which I think requires regular expression, but I'm unable to figure out how to go about it.我尝试创建一个我认为需要正则表达式的单独函数,但我无法弄清楚如何去做。 Any help will be much appreciated.任何帮助都感激不尽。
Use groupby
by splitting with your delimiter _
.通过使用分隔符_
拆分来使用groupby
。 Then, just aggregate with whatever functions you need.然后,只需聚合您需要的任何功能。
>>> df.groupby(df['thermometers']\
.str.split('_'). \
.str.get(0)).agg(['min', 'mean', 'max'])
min mean max
thermometers
T-10000 10.8 13.375 15.1
T-10001 17.7 18.475 19.8
T-10002 16.4 18.325 20.0
Another approach with str.extract
to avoid the call to str.get
:另一种使用str.extract
的方法来避免调用str.get
:
(df['Temperatures']
.groupby(df['thermometers'].str.extract('(^[^_]+)', expand=False))
.agg(['min', 'mean'])
)
Output:输出:
min mean
thermometers
T-10000 10.8 13.375
T-10001 17.7 18.475
T-10002 16.4 18.325
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.