[英]How to interpret pandas value_count() output?
我有一個 dataframe(df):
date O_3 NO_2 SO_2 PM10 PM25 CO Label
0 2001-01-01 01:00:00 7.86 67.120003 26.459999 32.349998 12.505127 0.45 2.0
1 2001-01-01 02:00:00 7.21 70.620003 20.879999 40.709999 12.505127 0.48 2.0
2 2001-01-01 03:00:00 7.11 72.629997 21.580000 50.209999 12.505127 0.41 2.0
3 2001-01-01 04:00:00 7.14 75.029999 19.270000 54.880001 12.505127 0.51 2.0
4 2001-01-01 05:00:00 8.46 66.589996 13.640000 42.340000 12.505127 0.19 2.0
... ... ... ... ... ... ... ... ...
139603 2018-04-30 20:00:00 63.00 58.000000 4.000000 2.000000 2.000000 0.30 1.0
139604 2018-04-30 21:00:00 49.00 65.000000 4.000000 5.000000 4.000000 0.30 2.0
139605 2018-04-30 22:00:00 49.00 58.000000 4.000000 5.000000 3.000000 0.30 2.0
139606 2018-04-30 23:00:00 48.00 52.000000 4.000000 7.000000 7.000000 0.30 2.0
139607 2018-05-01 00:00:00 52.00 43.000000 4.000000 6.000000 4.000000 0.30 1.0
我想知道“標簽”值的可變性,因此我:
# Variability of 'Labels' values
reshape_df['Label'].value_counts()
我得到:
2.0 80435
1.0 39393
3.0 15045
4.0 3295
5.0 1440
Name: Label, dtype: int64
我添加了一個新列,以便查看每一行的最大值列名:
# Create column with max pollutant name
reshape_df['Max_pollutant'] = reshape_df.eq(reshape_df.max(1), axis=0).dot(reshape_df.columns)
我得到:
date O_3 NO_2 SO_2 PM10 PM25 CO Label Max_pollutant
0 2001-01-01 01:00:00 7.86 67.120003 26.459999 32.349998 12.505127 0.45 2.0 NO_2
1 2001-01-01 02:00:00 7.21 70.620003 20.879999 40.709999 12.505127 0.48 2.0 NO_2
2 2001-01-01 03:00:00 7.11 72.629997 21.580000 50.209999 12.505127 0.41 2.0 NO_2
3 2001-01-01 04:00:00 7.14 75.029999 19.270000 54.880001 12.505127 0.51 2.0 NO_2
4 2001-01-01 05:00:00 8.46 66.589996 13.640000 42.340000 12.505127 0.19 2.0 NO_2
... ... ... ... ... ... ... ... ... ...
139603 2018-04-30 20:00:00 63.00 58.000000 4.000000 2.000000 2.000000 0.30 1.0 O_3
139604 2018-04-30 21:00:00 49.00 65.000000 4.000000 5.000000 4.000000 0.30 2.0 NO_2
139605 2018-04-30 22:00:00 49.00 58.000000 4.000000 5.000000 3.000000 0.30 2.0 NO_2
139606 2018-04-30 23:00:00 48.00 52.000000 4.000000 7.000000 7.000000 0.30 2.0 NO_2
139607 2018-05-01 00:00:00 52.00 43.000000 4.000000 6.000000 4.000000 0.30 1.0 O_3
如果我檢查“Max_pollutant”的可變性:
# Variability of 'Max_pollutant' names
reshape_df['Max_pollutant'].value_counts()
我得到以下 output:
NO_2 91155
O_3 43166
PM10 4760
O_3NO_2 417
NO_2PM10 48
SO_2 23
O_3PM10 22
PM25 15
O_3NO_2PM10 2
Name: Max_pollutant, dtype: int64
我不太了解出現兩種或多種污染物的值。 例如,'O_3NO_2' = 417,這是否意味着 O_3 的最大值與 NO_2 相同?
如何打印這些行,特別是為了查看每種污染物的讀數?
是的,那些“奇怪”的值是在 2 列中具有相同最大值的結果。
例如,您可以使用以下方式打印它們:
reshape_df.loc[reshape_df['Max_pollutant']=='O_3NO_2']
命令。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.