如果列表中的项目在另一个列表中，我如何签入 Pandas？

Question

我有两个不同的熊猫数据框

df_1 列 id(int)、name(string)、description(string)

和 df_2 列 id(int)、name(string)、description(string)

df_1 和 df_2 的名称只是相似但不相同，我想将两个数据帧与 df_1 的 id 连接起来。

我为两个数据框创建了一个名为 splitted_name 的新列，其中包含来自 name 列的单词列表。

现在我想检查 df_1.splitted_name 中的至少一个元素是否在 df_2.splitted_name 中。 如何在 Pandas 中完成这项工作？

样本数据：

df_1

    name                       name_split
1   Alone in the jungle       ['alone','in','the','jungle']
2   Born by the sea           ['born','by','the','sea']

df_2


1   Goodbye my love           ['goodbye','my','love']
2   Alone in the jungle remastered ['alone','in','the','jungle','remastered']

Answer 1

您应该首先将它们加入一个数据框，然后尝试此操作。 我用这些数据集做了我自己的例子：

df1 = pd.DataFrame(data=[['John Black'], ['Sara Smith'], ['Jane Jane']], columns=['name'])
df2 = pd.DataFrame(data=[['John Smith'], ['Sara Midname Smith'], ['Emma Sunshine']], columns=['name'])
df1['splitted_name'] = df1.name.str.split(' ')
df2['splitted_name'] = df2.name.str.split(' ')

创建具有所有可能组合的数据框：

df = []
for i in df1.values:
    for j in df2.values:
        df.append(i.tolist()+j.tolist())
df = pd.DataFrame(df)
df.columns = ['name1','splitted_name1', 'name2','splitted_name2']

最后比较拆分名称：

result = df.apply(lambda x: (pd.Index(pd.unique(x.splitted_name1)).get_indexer(x.splitted_name2) >= 0).any(), 1)

输出：

0     True
1    False
2    False
3     True
4     True
5    False
6    False
7    False
8    False
Name: result, dtype: bool

您也可以将其用作数据框中的新列：

df['result'] = result

然后过滤您需要的行：

df = df[df.result]

如果列表中的项目在另一个列表中，我如何签入 Pandas？

问题描述

1 个解决方案

解决方案1
0 2022-07-08 12:30:50

如果列表中的项目在另一个列表中，我如何签入 Pandas？

问题描述

1 个解决方案

解决方案1 0 2022-07-08 12:30:50

解决方案1
0 2022-07-08 12:30:50