[英]NLTK TypeError: unhashable type: 'list'
I am currently working on the lemmantization of a word from a csv file, where afterwards I passed all words in lowercase letters, removed all punctuation and split the column.我目前正在对 csv 文件中的单词进行词形还原,之后我以小写字母传递所有单词,删除所有标点符号并拆分列。
I use only two CSV columns: analyze.info()
:我只使用两个 CSV 列:
analyze.info()
:
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4637 entries, 0 to 4636. Data columns (total 2 columns):
# Column Non-Null Count Dtype
0 Comments 4637 non-null object
1 Classification 4637 non-null object
import string
import pandas as pd
from nltk.corpus import stopwords
from nltk.stem import
analyze = pd.read_csv('C:/Users/(..)/Talk London/ALL_dataset.csv', delimiter=';', low_memory=False, encoding='cp1252', usecols=['Comments', 'Classification'])
lower_case = analyze['Comments'].str.lower()
cleaned_text = lower_case.str.translate(str.maketrans('', '', string.punctuation))
tokenized_words = cleaned_text.str.split()
final_words = []
for word in tokenized_words:
if word not in stopwords.words('english'):
final_words.append(word)
wnl = WordNetLemmatizer()
lemma_words = []
lem = ' '.join([wnl.lemmatize(word) for word in tokenized_words])
lemma_words.append(lem)
When I run the code return this error:当我运行代码时返回此错误:
Traceback (most recent call last):
回溯(最近一次调用最后一次):
File "C:/Users/suiso/PycharmProjects/SA_working/SA_Main.py", line 52, in lem = ' '.join([wnl.lemmatize(word) for word in tokenized_words])文件“C:/Users/suiso/PycharmProjects/SA_working/SA_Main.py”,第 52 行,在 lem = ' '.join([wnl.lemmatize(word) for word in tokenized_words])
File "C:/Users/suiso/PycharmProjects/SA_working/SA_Main.py", line 52, in lem = ' '.join([wnl.lemmatize(word) for word in tokenized_words])文件“C:/Users/suiso/PycharmProjects/SA_working/SA_Main.py”,第 52 行,在 lem = ' '.join([wnl.lemmatize(word) for word in tokenized_words])
File "C:\\Users\\suiso\\PycharmProjects\\SA_working\\venv\\lib\\site-packages\\nltk\\stem\\wordnet.py", line 38, in lemmatize lemmas = wordnet._morphy(word, pos)文件“C:\\Users\\suiso\\PycharmProjects\\SA_working\\venv\\lib\\site-packages\\nltk\\stem\\wordnet.py”,第 38 行,词形还原词 lemmas = wordnet._morphy(word, pos)
File "C:\\Users\\suiso\\PycharmProjects\\SA_working\\venv\\lib\\site-packages\\nltk\\corpus\\reader\\wordnet.py", line 1897, in _morphy文件“C:\\Users\\suiso\\PycharmProjects\\SA_working\\venv\\lib\\site-packages\\nltk\\corpus\\reader\\wordnet.py”,第 1897 行,在 _morphy 中
if form in exceptions:如果形式在例外:
TypeError: unhashable type: 'list'类型错误:不可散列的类型:“列表”
tokenized_words
is a column of lists. tokenized_words
是一列列表。 The reason it's not a column of strings is because you used the split
method.它不是一列字符串的原因是因为您使用了
split
方法。 So you need to use a double for loop like so所以你需要像这样使用双循环
lem = ' '.join([wnl.lemmatize(word) for word_list in tokenized_words for word in word_list])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.