![](/img/trans.png)
[英]how to fix this error : numpy.ndarray “ object has no attribute ”append"
[英]How to fix 'numpy.ndarray' error while making training data-set using CountVectorizer?
我正在對我擁有的數據進行文本分類。 根據一些觀察,我需要確定目標變量。 我從詞袋和 tf/idf 方法開始。
我已經使分類器具有“一個”功能,但是當我嘗試合並更多“功能”(例如 7)來預測標簽時,計數向量化器會為 fit_transform 拋出錯誤。 以下是代碼
from sklearn import preprocessing
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.linear_model import LogisticRegression
bow = CountVectorizer()
# working fine for one feature
#observation = df_all_null_removed['Observation'].values
# selecting feature set of 7 variables
observation = df_all_null_removed[features].values
train_obs, test_obs,y_train, y_test =train_test_split(observation,
df_all_null_removed['HazardType'],
test_size=0.12,
random_state=42)
bow_matrix = bow.fit_transform(observation) # throws error - screen shot attached.
我認為這是因為“觀察”是形狀為 [8150,7] 的二維 numpy 數組,我們需要將其轉換為 8150 行的 1 列數組。
AttributeError Traceback (most recent call last)
<ipython-input-140-d75b27bd1080> in <module>()
----> 1 bow_matrix = bow.fit_transform(observation)
2 print("The vocabulary of the bow",len(bow.vocabulary_))
~/anaconda3/lib/python3.6/site-packages/sklearn/feature_extraction/text.py in fit_transform(self, raw_documents, y)
867
868 vocabulary, X = self._count_vocab(raw_documents,
--> 869 self.fixed_vocabulary_)
870
871 if self.binary:
~/anaconda3/lib/python3.6/site-packages/sklearn/feature_extraction/text.py in _count_vocab(self, raw_documents, fixed_vocab)
790 for doc in raw_documents:
791 feature_counter = {}
--> 792 for feature in analyze(doc):
793 try:
794 feature_idx = vocabulary[feature]
~/anaconda3/lib/python3.6/site-packages/sklearn/feature_extraction/text.py in <lambda>(doc)
264
265 return lambda doc: self._word_ngrams(
--> 266 tokenize(preprocess(self.decode(doc))), stop_words)
267
268 else:
~/anaconda3/lib/python3.6/site-packages/sklearn/feature_extraction/text.py in <lambda>(x)
230
231 if self.lowercase:
--> 232 return lambda x: strip_accents(x.lower())
233 else:
234 return strip_accents
AttributeError: 'numpy.ndarray' object has no attribute 'lower'
您可以使用 ColumnTransformer 為您的數據提供多種不同的預處理路徑。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.