關鍵字數據框的Python列解析器

Question

以下鏈接是我要解析的數據源示例。

http://www.mediafire.com/file/wfri4idoxszqixs/sampleWordData.xlsx

我有一欄中有一些有價值的話。 我想解析該行的每個單詞並將金額列附加到它們。 例如：

原始數據框
單詞（顏色1），數量（顏色2）
字數= [“ Google”，“ Google很棒”，“ Hi Google”]
數量= [5、10、5]

新數據框
Word1（Col 1），Word2（Col 2），Word3（Col 3），金額（Col 4）
Word1 = ['Google'，'Google'，'Hi']
Word2 = [''，'is'，'Google']
Word3 = [''，'awesome'，'']
數量= [5、10、5]

最終數據框
字= ['Google'，'is'，'awesome'，'Hi']
金額= [15、10、10、5]

我盡力解釋了最好的情況，因為很難使降價與列格式配合使用。 我在xlsx中顯示了有關如何嘗試轉換數據的每個步驟。

我下面的代碼嘗試：

import pandas as pd

#load the dataset
df = pd.read_csv('myfile.csv')
df.columns = ('words', 'amount')
df.head()

#toget rid of nulls
df.dropna(subset=['words', inplace=True)

#shows me how many columns are needed in total to encompass the longest line
print(df.words.str.split(expand=True).head() 

#attempt to split out the first word from the bunch of words per row
df2 = pd.DataFrame(df.words.str.split(' ', 1).tolist(),
                                  columns = ['word1', 'word2']

希望得到任何幫助或指導！

Answer 1

我希望有人可以給您一個更優雅的方法。

將每個單詞字符串分成一個列表，在新列中稱為words 。
將這些列表乘以Amount列，然后使用Counter請求其計數。
使用外部函數aggregator記錄中的這些計數。
最后，使用匯總數據構造新的數據框。

import pandas as pd
from collections import Counter, defaultdict

def aggregator(counter):
    for k in counter.keys():
        result[k]+=counter[k]

df = pd.read_excel('sampleWordData.xlsx', header=0)
df['words'] = df['Word'].str.split()
df['counts'] = (df['words']*df['Amount']).apply(Counter)
result = defaultdict(int)
df.counts.apply(aggregator)
new_df = pd.DataFrame({'words': list(result.keys()), 'counts': list(result.values())})
print (new_df)

打印結果：

   counts    words
0      20   Google
1      10       is
2      10  awesome
3       5       Hi

關鍵字數據框的Python列解析器

問題描述

1 個解決方案

解決方案1
0 2018-02-28 23:41:16

關鍵字數據框的Python列解析器

問題描述

1 個解決方案

解決方案1 0 2018-02-28 23:41:16

解決方案1
0 2018-02-28 23:41:16