检查存储在 Pandas Dataframe 中的列表中单词的字符串

Question

我有一个 Pandas 数据框，其中包含一个名为contains_and的列中的字符串列表。 现在我想从该数据框中选择contains_and在contains_and中的单词都包含在给定字符串中的行，例如

example: str = "I'm really satisfied with the quality and the price of product X"

df: pd.DataFrame = pd.DataFrame({"columnA": [1,2], "contains_and": [["price","quality"],["delivery","speed"]]})

产生这样的数据帧：

   columnA       contains_and
0        1   [price, quality]
1        2  [delivery, speed]

现在，我想只能选择第1行，如example包含在列表中的所有单词contains_and 。

我最初的直觉是做以下事情：

df.loc[
    all([word in example for word in df["contains_and"]])
    ]

但是，这样做会导致以下错误：

TypeError: 'in <string>' requires string as left operand, not list

我不太确定如何最好地做到这一点，但这似乎不应该太困难。 有人知道这样做的好方法吗？

Answer 1

单程：

df = df[df.contains_and.apply(lambda x: all((i in example) for i in x), 1)]

输出：

   columnA      contains_and
0        1  [price, quality]

Answer 2

另一种方式是explode荷兰国际集团的候选词列表，（每排）检查，如果他们都在的话， example那些被发现与str.split ：

# a Series of words
ex = pd.Series(example.split())

# boolean array reduced with `all`
to_keep = df["contains_and"].explode().isin(ex).groupby(level=0).all()

# keep only "True" rows
new_df = df[to_keep]

要得到

>>> new_df

   columnA      contains_and
0        1  [price, quality]

Answer 3

根据@Nk03 的回答，您也可以尝试：

df = df[df.contains_and.apply(lambda x: any([q for q in x if q in example]))]

在我看来，检查单词是否在示例中更直观，而不是相反，如您的第一次尝试所示。

检查存储在 Pandas Dataframe 中的列表中单词的字符串

问题描述

3 个解决方案

解决方案1
1 已采纳 2021-06-22 15:05:15

解决方案2
1 2021-06-22 15:12:07

解决方案3
0 2021-06-22 16:17:41

检查存储在 Pandas Dataframe 中的列表中单词的字符串

问题描述

3 个解决方案

解决方案1 1 已采纳 2021-06-22 15:05:15

解决方案2 1 2021-06-22 15:12:07

解决方案3 0 2021-06-22 16:17:41

解决方案1
1 已采纳 2021-06-22 15:05:15

解决方案2
1 2021-06-22 15:12:07

解决方案3
0 2021-06-22 16:17:41