[英]Checking if column in dataframe contains any item from list of strings
My goal is to check my dataframe column, and if that column contains items from a list of strings (matches in ex), then I want to create a new dataframe with all of those items that match.我的目标是检查我的数据框列,如果该列包含字符串列表中的项目(ex 中的匹配项),那么我想创建一个包含所有匹配项的新数据框。
With my current code I'm able to grab a list of the columns that match, however, It takes it as a list and I want to create a new dataframe with the previous information I had.使用我当前的代码,我能够获取匹配的列的列表,但是,它将它作为一个列表,我想用我以前的信息创建一个新的数据框。
Here is my current code - Rather than resulting to a list I want the entire dataframe information I previously had这是我当前的代码 - 我想要我以前拥有的整个数据框信息,而不是生成一个列表
matches = ['beat saber', 'half life', 'walking dead', 'population one']
checking = []
for x in hot_quest1['all_text']:
if any(z in x for z in matches):
checking.append(x)
Pandas generally allows you to filter data frames without resorting to for
loops. Pandas 通常允许您在不使用
for
循环的情况下过滤数据帧。
This is one approach that should work:这是一种应该有效的方法:
matches = ['beat saber', 'half life', 'walking dead', 'population one']
# matches_regex is a regular expression meaning any of your strings:
# "beat saber|half life|walking dead|population one"
matches_regex = "|".join(matches)
# matches_bools will be a series of booleans indicating whether the was a match
# for each item in the series
matches_bools = hot_quest1.all_text.str.contains(matches_regex, regex=True)
# You can then use that series of booleans to derive a new data frame
# containing only matching rows
matched_rows = hot_quest1[matches_bools]
Here's the documentation for the str.contains
method.这是
str.contains
方法的文档。 https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.