[英]Want to use where clause like SQL in Python
I have a corpus of text that needs to be analysed.我有一个需要分析的文本语料库。 I have a data frame with the below headers.
我有一个带有以下标题的数据框。
print((df.columns.values))
>>>> ['Unique ID' 'Quarter' 'Theme' 'Subtheme' 'Driver' 'Ticker' 'Company'
'Sub-sector' 'Issue weight' 'Quote' 'Executive name' 'Designation'
'Quote_len' 'word_count']
I have written a function to find Top 20 words in the 'Quote' column after removing stop words.我编写了一个函数来在删除停用词后在“引用”列中查找前 20 个词。
def get_top_n_words(corpus, n=None):
vec = CountVectorizer(stop_words = 'english').fit(corpus)
bag_of_words = vec.transform(corpus)
sum_words = bag_of_words.sum(axis=0)
words_freq = [(word, sum_words[0, idx]) for word, idx in vec.vocabulary_.items()]
words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True)
return words_freq[:n]
common_words = get_top_n_words(df['Quote'].values.astype('U'), 20)
for word, freq in common_words:
print(word, freq)
df2 = pd.DataFrame(common_words, columns = ['ReviewText' , 'count'])
df2.groupby('ReviewText').sum()['count'].sort_values(ascending=False).iplot(
kind='bar', yTitle='Count', linecolor='black', title='Top 20 words in review after removing stop words')
Now is wish to use a where clause within the code to find results for the column "Theme".现在希望在代码中使用 where 子句来查找“主题”列的结果。
For eg.例如。
Theme= 'Competitive advantage'
How to do that?怎么做?
Use DataFrame.loc[...]
to filter down your results.使用
DataFrame.loc[...]
过滤结果。
For example df = df.loc[df.Theme == 'Competitive advantage']
.例如
df = df.loc[df.Theme == 'Competitive advantage']
。
Then continue with common_words = get_top_n_words(df['Quote'].values.astype('U'), 20)
, but now the dataframe will only include results where Theme == 'Competitive advantage'
.然后继续
common_words = get_top_n_words(df['Quote'].values.astype('U'), 20)
,但现在数据框将只包含Theme == 'Competitive advantage'
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.