试图在python 3中获取所有不等于0.000000的列值

Question

I have a dataset shown below and I am trying to grab each name in the feature column where the importance column is not equal to 0.000000 and put them straight into a list to use straight away. 我有一个如下所示的数据集，我试图获取特征列中重要性列不等于0.000000的每个名称，并将它们直接放入列表中以立即使用。 I have tried a few methods but the main two which show promise are as follows: 我尝试了几种方法，但是显示出希望的主要两种方法如下：

Method 1 方法1

new_features = []

for i in importance_ranking['importance']:
    if i > 0.000000:
        new_features.append(i)

new_features

method 1 just grabs me all of the values of the importance column, but I want the feature column value instead so I tried method 2 方法1只是抓住了我重要性列的所有值，但我想使用功能列值，所以我尝试了方法2

Method 2 方法二

features_to_use = []
for x,y in importance_ranking:
    if y > 0.000000:
        features_to_use.append(x)

features_to_use

method 2 throws me the error as follows: 方法2抛出以下错误：

method 2 error 方法2错误

    ValueError                                Traceback (most recent call last)
<ipython-input-1181-d1ec4f141ff9> in <module>()
      1 features_to_use = []
----> 2 for x,y in importance_ranking:
      3     if y > 0.000000:
      4         features_to_use.append(x)
      5 

ValueError: too many values to unpack (expected 2)

any help is greatly appreciated 任何帮助是极大的赞赏

method 3 and error 方法3和错误

    features_to_use = []
for s,x,y in importance_ranking:
    if y > 0.000000:
        features_to_use.append(x)

features_to_use
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1182-8ed92369130e> in <module>()
      1 features_to_use = []
----> 2 for s,x,y in importance_ranking:
      3     if y > 0.000000:
      4         features_to_use.append(x)
      5 

ValueError: too many values to unpack (expected 3)

Dataset 数据集

   **feature    importance**
1   src_bytes   0.541433
18  count   0.160338
30  dst_host_diff_srv_rate  0.074743
53  service_bgp 0.066960
31  dst_host_same_src_port_rate 0.045040
28  dst_host_srv_count  0.027176
9   num_compromised 0.016684
25  diff_srv_rate   0.008991
58  service_pm_dump 0.008533
62  service_auth    0.008270
29  dst_host_same_srv_rate  0.006760
2   dst_bytes   0.005153
33  dst_host_serror_rate    0.004642
6   hot 0.003985
32  dst_host_srv_diff_host_rate 0.003330
35  dst_host_rerror_rate    0.002923
34  dst_host_srv_serror_rate    0.002222
87  service_klogin  0.002135
116 flag_SH 0.001553
0   duration    0.001263
7   num_failed_logins   0.001125
22  rerror_rate 0.001011
27  dst_host_count  0.000917
4   wrong_fragment  0.000736
52  service_ntp_u   0.000489
37  flag_RSTOS0 0.000468
3   land    0.000449
111 service_tftp_u  0.000355
19  srv_count   0.000289
8   logged_in   0.000284
... ... ...
16  is_host_login   0.000000
40  service_Z39_50  0.000000
41  service_http_443    0.000000
43  service_other   0.000000
44  protocol_type_tcp   0.000000
45  service_link    0.000000
46  service_X11 0.000000
47  service_exec    0.000000
48  service_red_i   0.000000
49  service_http_2784   0.000000

Line used to create Dataframe 用于创建数据框的行

importance_ranking = pd.DataFrame({'feature':all_cols, 'importance':dt.feature_importances_})

pic of dataframe 数据帧的图片

new_test new_test

#features_to_use = []
a,b = importance_ranking[0]
#for s,x,y in importance_ranking:
 #   if y > 0.000000:
     #   features_to_use.append(x)
#
#features_to_use


KeyError                                  Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2524             try:
-> 2525                 return self._engine.get_loc(key)
   2526             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 0

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-1244-5d9e2e614219> in <module>()
      1 #features_to_use = []
----> 2 a,b = importance_ranking[0]
      3 #for s,x,y in importance_ranking:
      4  #   if y > 0.000000:
      5      #   features_to_use.append(x)

~\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2137             return self._getitem_multilevel(key)
   2138         else:
-> 2139             return self._getitem_column(key)
   2140 
   2141     def _getitem_column(self, key):

~\Anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
   2144         # get column
   2145         if self.columns.is_unique:
-> 2146             return self._get_item_cache(key)
   2147 
   2148         # duplicate columns & possible reduce dimensionality

~\Anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
   1840         res = cache.get(item)
   1841         if res is None:
-> 1842             values = self._data.get(item)
   1843             res = self._box_item_values(item, values)
   1844             cache[item] = res

~\Anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
   3841 
   3842             if not isna(item):
-> 3843                 loc = self.items.get_loc(item)
   3844             else:
   3845                 indexer = np.arange(len(self.items))[isna(self.items)]

~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2525                 return self._engine.get_loc(key)
   2526             except KeyError:
-> 2527                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2528 
   2529         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 0

Answer 1

I think the best idea is to use boolean indexing : 我认为最好的主意是使用布尔索引：

df = importance_ranking[importance_ranking['importance']>0.000000]

and then get features: 然后获取功能：

features = df.features

Answer 2

DataFrames offer a great way to select the data you want DataFrames提供了一种选择所需数据的好方法

features_to_use = importance_ranking[importance_ranking['importance'] > 0.0]['importance'].values.tolist()

It may be difficult to understand at a first sight but what you actually do is filter all the importance_rankings that have an importance greater than 0.0 and then select the importance column of the importance_rankings that satisfy this condition. 乍一看可能很难理解，但是您实际要做的是过滤所有重要性大于0.0的重要性列表，然后选择满足此条件的重要性列的重要性列。 The rest of the line .values.tolist() is just used to unpack your data. .values.tolist()行的其余部分仅用于解压缩数据。

If you feel uncomfortable with this solution you can just try doing it step by step: 如果您对此解决方案不满意，可以尝试逐步进行：

df = importance_ranking[importance_ranking['importance'] > 0.0] # Filtered Dataframe
importance_values = df['importance'] # Series Object
features_to_use = importance_values.values.tolist()

试图在python 3中获取所有不等于0.000000的列值

问题描述

Method 1 方法1

Method 2 方法二

method 2 error 方法2错误

method 3 and error 方法3和错误

Dataset 数据集

Line used to create Dataframe 用于创建数据框的行

pic of dataframe 数据帧的图片

new_test new_test

2 个解决方案

解决方案1
2 2018-04-28 10:53:01

解决方案2
2 已采纳 2018-04-28 11:00:23

试图在python 3中获取所有不等于0.000000的列值

问题描述

Method 1 方法1

Method 2 方法二

method 2 error 方法2错误

method 3 and error 方法3和错误

Dataset 数据集

Line used to create Dataframe 用于创建数据框的行

pic of dataframe 数据帧的图片

new_test new_test

2 个解决方案

解决方案1 2 2018-04-28 10:53:01

解决方案2 2 已采纳 2018-04-28 11:00:23

解决方案1
2 2018-04-28 10:53:01

解决方案2
2 已采纳 2018-04-28 11:00:23