[英]How can I replace the stem word sentence in pandas.Series?
Here I got a pandas.series named 'traindata'. 在这里,我得到了一个名为“ traindata”的pandas.series。
0 Published: 4:53AM Friday August 29, 2014 Sourc...
1 8 Have your say\n\n\nPlaying low-level club c...
2 Rohit Shetty has now turned producer. But the ...
3 A TV reporter in Serbia almost lost her job be...
4 THE HAGUE -- Tony de Brum was 9 years old in 1...
5 Australian TV cameraman Harry Burton was kille...
6 President Barack Obama sharply rebuked protest...
7 The car displaying the DIE FOR SYRIA! sticker....
8 \nIf you've ever been, you know that seeing th...
9 \nThe former executive director of JBWere has ...
10 Waterloo Road actor Joe Slater has revealed hi...
...
**Name: traindata, Length: 2284, dtype: object**
and what I want to do is to replace the series.values with the stemmed sentences. 我想做的是用词干句子替换series.values。
my thought is to build a new series and put the stemmed sentence in. my code is as below: 我的想法是建立一个新系列,并添加词干句子。我的代码如下:
from nltk.stem.porter import PorterStemmer
stem_word_data = np.zeros([2284,1])
ps = PorterStemmer()
for i in range(0,len(traindata)):
tst = word_tokenize(traindata[i])
for word in tst:
word = ps.stem(word)
stem_word_data[i] = word
and then an error occurs: 然后发生错误:
ValueError: could not convert string to float: 'publish'
Anyone knows how to fix this error or anyone has a better idea on how to replace the series.values with the stemmed sentence? 任何人都知道如何解决此错误,或者有人对如何用词干句子替换series.values有更好的主意? thanks. 谢谢。
You can use apply
on a series and avoid writing loops. 您可以对一系列apply
并避免编写循环。
from nltk import word_tokenize
from nltk.stem import PorterStemmer
## intialise stemmer class
pst = PorterStemmer()
## sample data frame
df = pd.DataFrame({'senten': ['I am not dancing','You are playing']})
## apply here
df['senten'] = df['senten'].apply(word_tokenize)
df['senten'] = df['senten'].apply(lambda x: ' '.join([pst.stem(y) for y in x]))
print(df)
senten
0 I am not danc
1 you are play
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.