[英]Panda dataframe column conditional on another column
import pandas as pd
import urllib.request
import numpy as np
url="https://www.misoenergy.org/Library/Repository/Market%20Reports/20170811_da_bc.xls"
cnstxls = urllib.request.urlopen(url)
xl = pd.ExcelFile(cnstxls)
df = xl.parse("Sheet1",skiprows=3)
constr = df.iloc[:,1:7]
constr['Class'] = np.where(constr['Hour of Occurrence'] == (1,2,3,4,5,6), 'Offpeak', 'Onpeak')
sumsp=constr.groupby('Constraint_ID','Class',axis=0)['Shadow Price'].sum().sort_values(ascending=True)`
1)新的列类给出错误-表示TypeError: invalid type comparison
。 如何基于多个小时设置这个新列? 当我只放一个小时(1或2或3 ...)时,此方法有效
2) TypeError: groupby() got multiple values for argument 'axis'
。 我想使用两列来GROUPBY
。 它与一列一起使用。
我们试试吧:
constr['Class'] = np.where(constr['Hour of Occurrence'].isin([1,2,3,4,5,6]),'Offpeak','Onpeak')
sumsp = constr.groupby(['Constraint_ID','Class'],axis=0)['Shadow Price'].sum().sort_values(ascending=True)
print(sumsp)
输出:
Constraint_ID Class
281292 Onpeak -780.05
1049 Onpeak -364.68
4636 Onpeak -276.62
201082 Onpeak -245.44
1607 Onpeak -237.36
98333 Onpeak -112.05
107318 Onpeak -96.10
270366 Onpeak -80.71
267644 Onpeak -73.25
285770 Onpeak -59.53
1049 Offpeak -46.52
281292 Offpeak -33.80
270888 Onpeak -19.68
289484 Offpeak -10.41
Onpeak -4.52
1607 Offpeak -2.60
9712 Onpeak 0.84
268470 Onpeak 1.14
248010 Onpeak 1.48
287090 Onpeak 1.63
Offpeak 11.78
188144 Offpeak 26.32
4862 Onpeak 28.03
285770 Offpeak 50.21
Name: Shadow Price, dtype: float64
unstack
以枢轴课程: sumsp.unstack('Class')
输出:
Class Offpeak Onpeak
Constraint_ID
1049 -46.52 -364.68
1607 -2.60 -237.36
4636 NaN -276.62
4862 NaN 28.03
9712 NaN 0.84
98333 NaN -112.05
107318 NaN -96.10
188144 26.32 NaN
201082 NaN -245.44
248010 NaN 1.48
267644 NaN -73.25
268470 NaN 1.14
270366 NaN -80.71
270888 NaN -19.68
281292 -33.80 -780.05
285770 50.21 -59.53
287090 11.78 1.63
289484 -10.41 -4.52
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.