[英]trying to grab all column values that aren't equal to 0.000000 in python 3
I have a dataset shown below and I am trying to grab each name in the feature column where the importance column is not equal to 0.000000 and put them straight into a list to use straight away. 我有一个如下所示的数据集,我试图获取特征列中重要性列不等于0.000000的每个名称,并将它们直接放入列表中以立即使用。 I have tried a few methods but the main two which show promise are as follows: 我尝试了几种方法,但是显示出希望的主要两种方法如下:
new_features = []
for i in importance_ranking['importance']:
if i > 0.000000:
new_features.append(i)
new_features
method 1 just grabs me all of the values of the importance column, but I want the feature column value instead so I tried method 2 方法1只是抓住了我重要性列的所有值,但我想使用功能列值,所以我尝试了方法2
features_to_use = []
for x,y in importance_ranking:
if y > 0.000000:
features_to_use.append(x)
features_to_use
method 2 throws me the error as follows: 方法2抛出以下错误:
ValueError Traceback (most recent call last)
<ipython-input-1181-d1ec4f141ff9> in <module>()
1 features_to_use = []
----> 2 for x,y in importance_ranking:
3 if y > 0.000000:
4 features_to_use.append(x)
5
ValueError: too many values to unpack (expected 2)
any help is greatly appreciated 任何帮助是极大的赞赏
features_to_use = []
for s,x,y in importance_ranking:
if y > 0.000000:
features_to_use.append(x)
features_to_use
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-1182-8ed92369130e> in <module>()
1 features_to_use = []
----> 2 for s,x,y in importance_ranking:
3 if y > 0.000000:
4 features_to_use.append(x)
5
ValueError: too many values to unpack (expected 3)
**feature importance**
1 src_bytes 0.541433
18 count 0.160338
30 dst_host_diff_srv_rate 0.074743
53 service_bgp 0.066960
31 dst_host_same_src_port_rate 0.045040
28 dst_host_srv_count 0.027176
9 num_compromised 0.016684
25 diff_srv_rate 0.008991
58 service_pm_dump 0.008533
62 service_auth 0.008270
29 dst_host_same_srv_rate 0.006760
2 dst_bytes 0.005153
33 dst_host_serror_rate 0.004642
6 hot 0.003985
32 dst_host_srv_diff_host_rate 0.003330
35 dst_host_rerror_rate 0.002923
34 dst_host_srv_serror_rate 0.002222
87 service_klogin 0.002135
116 flag_SH 0.001553
0 duration 0.001263
7 num_failed_logins 0.001125
22 rerror_rate 0.001011
27 dst_host_count 0.000917
4 wrong_fragment 0.000736
52 service_ntp_u 0.000489
37 flag_RSTOS0 0.000468
3 land 0.000449
111 service_tftp_u 0.000355
19 srv_count 0.000289
8 logged_in 0.000284
... ... ...
16 is_host_login 0.000000
40 service_Z39_50 0.000000
41 service_http_443 0.000000
43 service_other 0.000000
44 protocol_type_tcp 0.000000
45 service_link 0.000000
46 service_X11 0.000000
47 service_exec 0.000000
48 service_red_i 0.000000
49 service_http_2784 0.000000
importance_ranking = pd.DataFrame({'feature':all_cols, 'importance':dt.feature_importances_})
#features_to_use = []
a,b = importance_ranking[0]
#for s,x,y in importance_ranking:
# if y > 0.000000:
# features_to_use.append(x)
#
#features_to_use
KeyError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2524 try:
-> 2525 return self._engine.get_loc(key)
2526 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 0
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-1244-5d9e2e614219> in <module>()
1 #features_to_use = []
----> 2 a,b = importance_ranking[0]
3 #for s,x,y in importance_ranking:
4 # if y > 0.000000:
5 # features_to_use.append(x)
~\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2137 return self._getitem_multilevel(key)
2138 else:
-> 2139 return self._getitem_column(key)
2140
2141 def _getitem_column(self, key):
~\Anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
2144 # get column
2145 if self.columns.is_unique:
-> 2146 return self._get_item_cache(key)
2147
2148 # duplicate columns & possible reduce dimensionality
~\Anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
1840 res = cache.get(item)
1841 if res is None:
-> 1842 values = self._data.get(item)
1843 res = self._box_item_values(item, values)
1844 cache[item] = res
~\Anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
3841
3842 if not isna(item):
-> 3843 loc = self.items.get_loc(item)
3844 else:
3845 indexer = np.arange(len(self.items))[isna(self.items)]
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2525 return self._engine.get_loc(key)
2526 except KeyError:
-> 2527 return self._engine.get_loc(self._maybe_cast_indexer(key))
2528
2529 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 0
I think the best idea is to use boolean indexing : 我认为最好的主意是使用布尔索引 :
df = importance_ranking[importance_ranking['importance']>0.000000]
and then get features: 然后获取功能:
features = df.features
DataFrames offer a great way to select the data you want DataFrames提供了一种选择所需数据的好方法
features_to_use = importance_ranking[importance_ranking['importance'] > 0.0]['importance'].values.tolist()
It may be difficult to understand at a first sight but what you actually do is filter all the importance_rankings that have an importance greater than 0.0 and then select the importance column of the importance_rankings that satisfy this condition. 乍一看可能很难理解,但是您实际要做的是过滤所有重要性大于0.0的重要性列表,然后选择满足此条件的重要性列的重要性列。 The rest of the line .values.tolist()
is just used to unpack your data. .values.tolist()
行的其余部分仅用于解压缩数据。
If you feel uncomfortable with this solution you can just try doing it step by step: 如果您对此解决方案不满意,可以尝试逐步进行:
df = importance_ranking[importance_ranking['importance'] > 0.0] # Filtered Dataframe
importance_values = df['importance'] # Series Object
features_to_use = importance_values.values.tolist()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.