python計算每一行中的單詞，並保存在新列中

Question

python的新手，開始學習如何處理數據，並遇到了一些麻煩。

我有一個數據集（熊貓），每一行都有一個句子。 我想創建一個新列，該列計算句子（每行）中的單詞。

如果句子是：“ Hello World Hello dogs”，則counter一詞將是-

{'Hello' - 2, 'World' - 1, 'dogs' -1}

我通常使用graphlab，它是通過以下方式完成的：

dataset['new_column'] = graphlab.text_analytics.count_words(..)

我看到了很多類似的解決方案，但是在添加新列時沒有看到數據集，而且我從未真正使用python編程過。

希望有一些指導。

Answer 1

我建議不要將字典存儲在數據框中的單元格中，但是如果無法解決，可以使用Counter

dataset = pd.DataFrame([['Hello world dogs'], ['this is another sentence']], columns=['column_of_interest'] )

from collections import Counter
dataset['new_column'] = dataset.column_of_interest.apply(lambda x: Counter(x.split(' ')))
dataset

    column_of_interest  new_column
0   Hello world dogs    {'dogs': 1, 'world': 1, 'Hello': 1}
1   this is another sentence    {'is': 1, 'sentence': 1, 'this': 1, 'another': 1}

編輯：基於以下注釋，如果有不包含字符串的單元格，則在分割lambda x: Counter(str(x).split(' ')))之前可能需要轉換為str lambda x: Counter(str(x).split(' ')))

Answer 2

公認的答案起到了作用。

如果有人想要，沒有熊貓的答案：

def word_count(text):
    word_count = {}
    for word in text.split():
        if word not in word_count:
            word_count[word] = 1
        else:
            word_count[word] += 1
    return word_count

data['word_count'] = data['sentences'].apply(word_count)

測試：

print word_count("Hello Hello world")

輸出：

{'world': 1, 'Hello': 2}

python計算每一行中的單詞，並保存在新列中

問題描述

2 個解決方案

解決方案1
2 已采納 2017-08-08 16:34:35

解決方案2
1 2017-09-02 09:36:03

python計算每一行中的單詞，並保存在新列中

問題描述

2 個解決方案

解決方案1 2 已采納 2017-08-08 16:34:35

解決方案2 1 2017-09-02 09:36:03

解決方案1
2 已采納 2017-08-08 16:34:35

解決方案2
1 2017-09-02 09:36:03