![](/img/trans.png)
[英]pandas dataframe append values to one column based on the values in another dataframe
[英]Append values to one dataframe based on their frequency in another dataframe
我有兩個數據幀,df1是groupby或df.groupby('keyword')
:
df1
keyword string
A "This is a test string for the example"
"This is also a test string based on the other string"
"This string is a test string based on the other strings"
B "You can probably guess that this is also a test string"
"Yet again, another test string"
"This is also a test"
和df2
這是一個空數據幀,現在我還有一個特定值列表:
keyword_list = ['string', 'test']
基本上我想計算在keyword_list
和df1
的每個單詞的頻率,並且根據df1
的關鍵字,將該單詞附加到新數據框中的特定列的單詞最多,所以df2的'A'
被分配df1的string
列中出現的最高值。
理想情況下,因為'string'
是df1的A
關鍵字列中出現的最高值,所以它會被賦予string
等等。
df2
keyword High_freq_word
A "string"
B "test"
如果您需要澄清或有意義,請告訴我!
更新:
@ anky_91提供了一些很棒的代碼,但輸出有點尷尬
df['matches'] = df.description.str.findall('|'.join(keyword_list))
df.groupby(odf.Type.ffill()).matches.apply(lambda x: ''.join(mode(list(chain.from_iterable(x)))[0]))
得到你
DF1
keyword string
A "This is a test string for the example"
"This is also a test string based on the other string"
"This string is a test string based on the other strings"
B "You can probably guess that this is also a test string"
"Yet again, another test string"
"This is also a test"
但是它添加了一個新列:
matches
['string','test']
['test', 'string','string]
[etc...]
我可以想出一種方法來以數字方式轉換它,然后將該值分配給列,但更大的問題是將此新列附加到新數據幀。
由於它是一個groupby,有幾個重復的值,我試圖找到一種pythonic方式將“最常用的單詞”映射到關鍵字本身而不是基於關鍵字列表的整個模式。
據我所知,你可以這樣做:
from itertools import chain
from scipy.stats import mode
keyword_list = ['string', 'test']
df['matches']=df.string.str.findall('|'.join(keyword_list)) #find all matches
df.groupby(df.keyword.ffill()).matches.apply(lambda x: ''.join(mode(list(chain.from_iterable(x)))[0]))
keyword
A string
B test
Name: matches, dtype: object
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.