[英]applying function to all columns in an numpy array
我对numpy非常陌生。
只是想知道为什么这不起作用。
print items['description']
产量
0 Продам Камаз 6520 20 тонн
1 Весь в тюнинге.
2 Телефон в хорошем состоянии, трещин и сколов н...
3 Отличный подарок на новый год от "китайской ap...
4 Лыжные ботинки в хорошем состоянии, 34 размер
Name: description, dtype: object
尝试将此方法应用于此col中的所有行。
items['description'] = vectorize_sentence(items['description'].astype(str))
这是向量化语句的功能定义。
def vectorize_sentence(self, sentence):
# Tokenize
print 'sentence', sentence
tkns = self._tokenize(sentence)
vec = None
for tkn in tkns:
print 'tkn', tkn.decode('utf-8')
print type(tkn)
if self.model[tkn.decode('utf-8')]:
vec = sum(vec, self.model[tkn.decode('utf-8')])
#vec = sum([self.model[x] for x in tkns if x in self.model])
#print vec
def _tokenize(self, sentence):
return sentence.split(' ')
错误信息:
AttributeError: 'Series' object has no attribute 'split'
您收到该错误是因为'Series' object has no attribute 'split'
。 .astype(str)
, .astype(str)
不会像您认为的那样返回单个长字符串
items = pd.DataFrame({'description': ['bob loblaw', 'john wayne', 'lady gaga loves to sing']})
sentence = items['description'].astype(str)
sentence.split(' ')
现在尝试
sentence = ' '.join(x for x in items['description'])
sentence.split(' ')
然后在你的功能中实现
def _tokenize(self, sentence):
return ' '.join(x for x in items['description']).split(' ')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.