[英]Extracting values from pandas dataframe based on list of strings
I'm trying to filter a pandas dataframe, which contains a column with news headlines (column name 'title'), based on whether each headline contains any of the company names from a list ('co_names_list') 我试图根据每个标题是否包含列表中的任何公司名称(“ co_names_list”)来过滤熊猫数据框,该数据框包含带有新闻标题的列(列名“ title”)
I've already tried the following 我已经尝试了以下
sp500news = pd.DataFrame()
for i in raw_news_2.index:
for j in co_names_list:
if j in raw_news_2.loc[i,'title']:
sp500news = sp500news.append(raw_news_2.iloc[i])
print(sp500news)
sp500news = raw_news_2.loc[raw_news_2['title'].isin(co_names_list)]
I think this should do what you want: 我认为这应该做您想要的:
df[df.title.str.contains('|'.join(co_names_list))]
What you are doing with this is checking for each sentence in title
, if any of the words in co_names_list
are contained in the sentence. 您正在执行的操作是检查title
每个句子,如果该句子中包含co_names_list
中的任何单词。 That is done by joining all words in the sentence with a '|'
这是通过将句子中的所有单词与'|'
连接起来来完成'|'
, which acts as an OR
operator. ,用作OR
运算符。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.