简体   繁体   中英

How can I check in Pandas if an item from a list is in an another list?

I have a two different pandas Dataframe

df_1 with columns id(int), name(string), description(string)

and df_2 with columns id(int), name(string), description(string)

The names from df_1 and df_2 are only similar but not the same and I would like to connect both data frames with id of df_1.

I have created a new column for both dataframes called splitted_name with a list of words from name column.

Now I would like to check if at least one element from df_1.splitted_name is in df_2.splitted_name. How can I get this done in Pandas?

sample data:

df_1

    name                       name_split
1   Alone in the jungle       ['alone','in','the','jungle']
2   Born by the sea           ['born','by','the','sea']

df_2


1   Goodbye my love           ['goodbye','my','love']
2   Alone in the jungle remastered ['alone','in','the','jungle','remastered']

You should first join them to one Data frame and then try this. I have made my own example with these datasets:

df1 = pd.DataFrame(data=[['John Black'], ['Sara Smith'], ['Jane Jane']], columns=['name'])
df2 = pd.DataFrame(data=[['John Smith'], ['Sara Midname Smith'], ['Emma Sunshine']], columns=['name'])
df1['splitted_name'] = df1.name.str.split(' ')
df2['splitted_name'] = df2.name.str.split(' ')

Create data frame with all possible combinations:

df = []
for i in df1.values:
    for j in df2.values:
        df.append(i.tolist()+j.tolist())
df = pd.DataFrame(df)
df.columns = ['name1','splitted_name1', 'name2','splitted_name2']

And finally compare splitting names:

result = df.apply(lambda x: (pd.Index(pd.unique(x.splitted_name1)).get_indexer(x.splitted_name2) >= 0).any(), 1)

Output:

0     True
1    False
2    False
3     True
4     True
5    False
6    False
7    False
8    False
Name: result, dtype: bool

Also you can use it as a new column in the Data frame:

df['result'] = result

And then filter rows you need:

df = df[df.result]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM