简体   繁体   English

想在 Python 中使用像 SQL 一样的 where 子句

[英]Want to use where clause like SQL in Python

I have a corpus of text that needs to be analysed.我有一个需要分析的文本语料库。 I have a data frame with the below headers.我有一个带有以下标题的数据框。

print((df.columns.values))
>>>> ['Unique ID' 'Quarter' 'Theme' 'Subtheme' 'Driver' 'Ticker' 'Company'
'Sub-sector' 'Issue weight' 'Quote' 'Executive name' 'Designation'
'Quote_len' 'word_count']

I have written a function to find Top 20 words in the 'Quote' column after removing stop words.我编写了一个函数来在删除停用词后在“引用”列中查找前 20 个词。

def get_top_n_words(corpus, n=None):
    vec = CountVectorizer(stop_words = 'english').fit(corpus)
    bag_of_words = vec.transform(corpus)
    sum_words = bag_of_words.sum(axis=0) 
    words_freq = [(word, sum_words[0, idx]) for word, idx in vec.vocabulary_.items()]
    words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True)
    return words_freq[:n]
common_words = get_top_n_words(df['Quote'].values.astype('U'), 20)
for word, freq in common_words:
    print(word, freq)
df2 = pd.DataFrame(common_words, columns = ['ReviewText' , 'count'])
df2.groupby('ReviewText').sum()['count'].sort_values(ascending=False).iplot(
    kind='bar', yTitle='Count', linecolor='black', title='Top 20 words in review after removing stop words')

Now is wish to use a where clause within the code to find results for the column "Theme".现在希望在代码中使用 where 子句来查找“主题”列的结果。

For eg.例如。 Theme= 'Competitive advantage'

How to do that?怎么做?

Use DataFrame.loc[...] to filter down your results.使用DataFrame.loc[...]过滤结果。

For example df = df.loc[df.Theme == 'Competitive advantage'] .例如df = df.loc[df.Theme == 'Competitive advantage']

Then continue with common_words = get_top_n_words(df['Quote'].values.astype('U'), 20) , but now the dataframe will only include results where Theme == 'Competitive advantage' .然后继续common_words = get_top_n_words(df['Quote'].values.astype('U'), 20) ,但现在数据框将只包含Theme == 'Competitive advantage'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM