繁体   English   中英

在具有多种条件的 Pandas 中创建箱线图

[英]Creating boxplots in Pandas with multiple conditions

 time xco2 lon lat mask front flag alt type time 2016-07-18 18:00:40 64835.00 400.345876 -77.665768 40.444690 1.00 2.0 0.00 3198.345000 warm 2016-07-18 18:00:50 64845.00 400.694926 -77.679259 40.450737 0.98 2.0 0.00 3199.400000 warm 2016-07-18 18:01:00 64855.00 401.107295 -77.692715 40.456796 0.98 2.0 0.00 3197.810000 warm 2016-07-18 18:01:10 64865.00 401.566160 -77.706165 40.462843 0.95 2.0 0.00 3196.500000 warm 2016-07-18 18:01:20 64875.00 401.752364 -77.719628 40.468837 1.00 2.0 0.00 3197.945000 warm ... ... ... ... ... ... ... ... ... ... 2016-07-18 18:50:30 67825.00 391.580408 -80.799363 41.847582 0.81 NaN 0.00 3158.575000 cold 2016-07-18 18:50:40 67835.00 392.728223 -80.809320 41.851846 1.00 NaN 0.00 3241.930000 cold 2016-07-18 18:50:50 67845.00 392.051042 -80.819123 41.855974 0.43 NaN 1.14 3340.510000 cold 2016-07-18 18:51:00 67855.00 392.827331 -80.828735 41.860006 1.00 NaN 0.00 3428.665000 cold 2016-07-18 18:51:10 67862.95 392.934952 -80.836415 41.863085 1.00 NaN 0.00 3483.171186 cold 304 rows × 9 columns

我有很多天要做,而我目前的做法非常耗时,我需要一种更有效的方法! 我需要用冷或暖分隔的数据,我有一列表示这一点。 然后我需要每个盒子和胡须都是 0.5 度纬度。 我目前正在为每个半度数的数据手动创建一个新列。图像是我一直在做的事情,以及如何设置数据的快照。 这是旧的方法,很多时候需要更多的列

 warm=np.arange(41.367440,44.13,0.25) cold=np.arange(44.141705,46.321997,0.25) print(warm) print(cold) xco2_0=df_layer102['XCO2'].loc()[(df_layer102['lat'] > warm[0]) & (df_layer102['lat'] <= warm[1])] xco2_1=df_layer102['XCO2'].loc()[(df_layer102['lat'] > warm[1]) & (df_layer102['lat'] <= warm[2])] xco2_2=df_layer102['XCO2'].loc()[(df_layer102['lat'] > warm[2]) & (df_layer102['lat'] <= warm[3])] xco2_3=df_layer102['XCO2'].loc()[(df_layer102['lat'] > warm[3]) & (df_layer102['lat'] <= warm[4])] xco2_4=df_layer102['XCO2'].loc()[(df_layer102['lat'] > warm[4]) & (df_layer102['lat'] <= warm[5])] xco2_5=df_layer102['XCO2'].loc()[(df_layer102['lat'] > warm[5]) & (df_layer102['lat'] <= warm[6])] xco2_6=df_layer102['XCO2'].loc()[(df_layer102['lat'] > warm[6]) & (df_layer102['lat'] <= warm[7])] xco2_7=df_layer102['XCO2'].loc()[(df_layer102['lat'] > warm[7]) & (df_layer102['lat'] <= warm[8])] xco2_8=df_layer102['XCO2'].loc()[(df_layer102['lat'] > warm[8]) & (df_layer102['lat'] <= warm[9])] xco2_9=df_layer102['XCO2'].loc()[(df_layer102['lat'] > warm[9]) & (df_layer102['lat'] <= warm[10])] xco2_10=df_layer102['XCO2'].loc()[(df_layer102['lat'] > warm[10]) & (df_layer102['lat'] <= warm[11])] # xco2_11=df_layer10['XCO2'].loc()[(df_layer10['lat'] > warm[11]) & (df_layer10['lat'] <= warm[12])] # xco2_12=df_layer10['XCO2'].loc()[(df_layer10['lat'] > warm[12]) & (df_layer10['lat'] <= warm[13])] # xco2_11=df_layer10['XCO2'].loc()[(df_layer10['lat'] > warm[11]) & (df_layer10['lat'] <= cold[0])] xco2_11=df_layer102['XCO2'].loc()[(df_layer102['lat'] >= cold[0]) & (df_layer102['lat'] <= cold[1])] xco2_12=df_layer102['XCO2'].loc()[(df_layer102['lat'] > cold[1]) & (df_layer102['lat'] <= cold[2])] xco2_13=df_layer102['XCO2'].loc()[(df_layer102['lat'] > cold[2]) & (df_layer102['lat'] <= cold[3])] xco2_14=df_layer102['XCO2'].loc()[(df_layer102['lat'] > cold[3]) & (df_layer102['lat'] <= cold[4])] xco2_15=df_layer102['XCO2'].loc()[(df_layer102['lat'] > cold[4]) & (df_layer102['lat'] <= cold[5])] xco2_16=df_layer102['XCO2'].loc()[(df_layer102['lat'] > cold[5]) & (df_layer102['lat'] <= cold[6])] xco2_17=df_layer102['XCO2'].loc()[(df_layer102['lat'] > cold[6]) & (df_layer102['lat'] <= cold[7])] xco2_18=df_layer102['XCO2'].loc()[(df_layer102['lat'] > cold[7]) & (df_layer102['lat'] <= cold[8])] # xco2_19=df_layer10['XCO2'].loc()[(df_layer1['lat'] > cold[8]) & (df_layer10['lat'] <= cold[9])] # xco2_19=df_avg_up05['xco2_up'].loc()[(df_avg_up05['lat_up'] > num1[5]) & (df_avg_up05['lat_up'] <= num1[6])] # data_group_mid={'35 \°':xco2_35_36, '36 \°':xco2_36_37, '37 \°':xco2_37_38, '38 \°':xco2_38_39, '39 \°':xco2_39_40, '40 \°':xco2_40_41, '41 \°':xco2_41_42} data_group_front={'46.14\°':xco2_18, '45.89\°':xco2_17, '45.64\°':xco2_16, '45.39\°':xco2_15,'45.14\°':xco2_14,'44.89\°':xco2_13,'44.69\°':xco2_12,'44.39\°':xco2_11,'44.11\°':xco2_10,'43.86\°':xco2_9, \\ '43.61\°':xco2_8,'43.36\°':xco2_7,'43.11\°':xco2_6,'42.86\°':xco2_5,'42.61\°':xco2_4,'42.36\°':xco2_3,'42.11\°':xco2_2,'41.86\°':xco2_1,'41.61\°':xco2_0} df_xco2_front=pd.DataFrame(data=data_group_front) df_xco2_front.count()

方法一:

您可以做的是创建一个新列,用 pd.cut 存储“lat”

df_layer102['lat_bucketed'] = pd.cut(df_layer102['lat'], numpy.append(warm, cold))

这里暖和冷结合在一起,因为它们不重叠,并且已经有一列指示cold warm 但是您可以逐个分类。


方法二:

这也可以手动完成

df_layer102['lat_bucketed'] = ((df_layer102['lat'] - df_layer102['lat'].min())/0.5).astype(int)

这将为您提供一个带有桶索引的列(例如 0、1、2 等)。


然后,使用seaborn,你可以做

import seaborn as sns
sns.set(style="whitegrid")
ax = sns.boxplot(x="lat_bucketed", y="XCO2", data=df_layer102)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM