简体   繁体   English

检查存储在 Pandas Dataframe 中的列表中单词的字符串

[英]Checking string for words in list stored in Pandas Dataframe

I have a pandas dataframe containing a list of strings in a column called contains_and .我有一个 Pandas 数据框,其中包含一个名为contains_and的列中的字符串列表。 Now I want to select the rows from that dataframe whose words in contains_and are all contained in a given string, eg现在我想从该数据框中选择contains_andcontains_and中的单词包含在给定字符串中的行,例如

example: str = "I'm really satisfied with the quality and the price of product X"

df: pd.DataFrame = pd.DataFrame({"columnA": [1,2], "contains_and": [["price","quality"],["delivery","speed"]]})

resulting in a dataframe like this:产生这样的数据帧:

   columnA       contains_and
0        1   [price, quality]
1        2  [delivery, speed]

Now, I would like to only select row 1, as example contains all words in the list in contains_and .现在,我想只能选择第1行,如example包含在列表中的所有单词contains_and

My initial instinct was to do the following:我最初的直觉是做以下事情:

df.loc[
    all([word in example for word in df["contains_and"]])
    ]

However, doing that results in the following error:但是,这样做会导致以下错误:

TypeError: 'in <string>' requires string as left operand, not list

I'm not quite sure how to best do this, but it seems like something that shouldn't be all too difficult.我不太确定如何最好地做到这一点,但这似乎不应该太困难。 Does someone know of a good way to do this?有人知道这样做的好方法吗?

One way:单程:

df = df[df.contains_and.apply(lambda x: all((i in example) for i in x), 1)]

OUTPUT:输出:

   columnA      contains_and
0        1  [price, quality]

another way is explode ing the list of candidate words and checking (per row) if they are all in the words of example which are found withstr.split :另一种方式是explode荷兰国际集团的候选词列表,(每排)检查,如果他们都在的话, example那些被发现与str.split

# a Series of words
ex = pd.Series(example.split())

# boolean array reduced with `all`
to_keep = df["contains_and"].explode().isin(ex).groupby(level=0).all()

# keep only "True" rows
new_df = df[to_keep]

to get要得到

>>> new_df

   columnA      contains_and
0        1  [price, quality]

Based on @Nk03 answer, you could also try:根据@Nk03 的回答,您也可以尝试:

df = df[df.contains_and.apply(lambda x: any([q for q in x if q in example]))]

In my opinion is more intuitive to check if words are in example, rather than the opposite, as your first attempt shows.在我看来,检查单词是否在示例中更直观,而不是相反,如您的第一次尝试所示。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM