如何使用TextBlob單一化和定形化整個熊貓數據框列？

Question

我有一個熊貓數據框，其中包含以下幾列：df ['形容詞']，df ['名詞']和df ['adverbs']。 這些列中的每一列都包含基於其各自語音部分的標記列表。

我想使用TextBlob在數據框中創建三個新列，分別為df ['adjlemmatized']，df ['nounlemmatized']和df ['advlemmatized']。

這些列中的每一列都應包含單詞列表，該單詞列表由單數形式的引理形式的單詞組成。

我曾嘗試遵循TextBlob文檔，但仍堅持編寫將遍歷整個數據框的函數。

Words Inflection and Lemmatization

Each word in TextBlob.words or Sentence.words is a Word object (a subclass of unicode) with useful methods, e.g. for word inflection.

>>> sentence = TextBlob('Use 4 spaces per indentation level.')
>>> sentence.words
WordList(['Use', '4', 'spaces', 'per', 'indentation', 'level'])
>>> sentence.words[2].singularize()
'space'
>>> sentence.words[-1].pluralize()
'levels'
Words can be lemmatized by calling the lemmatize method.

>>> from textblob import Word
>>> w = Word("octopi")
>>> w.lemmatize()
'octopus'
>>> w = Word("went")
>>> w.lemmatize("v")  # Pass in WordNet part of speech (verb)
'go'

這是我用來從文本中獲取詞性的代碼：

# get adjectives
def get_adjectives(text):
    blob = TextBlob(text)
    print(text)
    return [word for (word,tag) in blob.tags if tag.startswith("JJ")]
df['adjectives'] = df['clean_reviews'].apply(get_adjectives)

Answer 1

如果您的單詞已經被標記，並且您希望保持這種方式，那就很簡單：

df['adjlemmatized'] = df.adjectives.apply(lambda x: [ TextBlob(w) for w in x])
df['adjlemmatized'] = df.adjlemmatized.apply(lambda x: [ w.lemmatize() for w in x])

如何使用TextBlob單一化和定形化整個熊貓數據框列？

問題描述

1 個解決方案

解決方案1
0 2019-08-07 18:26:29

如何使用TextBlob單一化和定形化整個熊貓數據框列？

問題描述

1 個解決方案

解決方案1 0 2019-08-07 18:26:29

解決方案1
0 2019-08-07 18:26:29