當一個列表的元素位於 pandas 列中時如何檢查它們是否是另一個列表

Question

給定一個 dataframe

d = {'col1': [['how', 'are', 'you'], ['im', 'fine', 'thanks'], ['you', 'know'], [np.nan]],
     'col2': [['tell', 'how', 'me', 'you'], ['who', 'cares'], ['know', 'this', 'padewan'], ['who', 'are', 'you']]

df = pd.DataFrame(data=d)

我想創建第三列col3 ，它是col2列表中的任何元素，它包含在col1列表中相應行的列表中，否則np.nan 。

它必須采用任何匹配的元素。

在這種情況下， col3將是：

           col1                      col2                           col3
0   ['how', 'are', 'you']      ['tell', 'how, 'me', 'you']        ['how', 'you']
1   ['im', 'fine', 'thanks']   ['who', 'cares']                   [np.nan] 
2   ['you', 'know']            ['know', 'this', 'padewan']        ['know']
3   [np.nan]                   ['who', 'are', 'you']              [np.nan]

我試過了

df['col3'] = [c in l for c, l in zip(df['col1'], df['col2'])]

這根本不起作用，所以任何想法都會非常有幫助。

Answer 1

像這樣的東西：

df['col3'] = [list(set(a).intersection(b)) for a, b in zip(df.col1, df.col2)]

Output：

                 col1                   col2        col3
0     [how, are, you]   [tell, how, me, you]  [you, how]
1  [im, fine, thanks]           [who, cares]          []
2         [you, know]  [know, this, padewan]      [know]
3               [nan]        [who, are, you]          []

Answer 2

另一個版本：

df['col3'] = df.apply(lambda x: [*set(x['col1']).intersection(x['col2'])] or [np.nan], axis=1 )

print(df)

印刷：

                 col1                   col2        col3
0     [how, are, you]   [tell, how, me, you]  [how, you]
1  [im, fine, thanks]           [who, cares]       [nan]
2         [you, know]  [know, this, padewan]      [know]
3               [nan]        [who, are, you]       [nan]

Answer 3

我會在 np.intersect1d 的幫助下編寫一個單獨的np.intersect1d並應用：

def intersect_nan(a,b):
    ret = np.intersect1d(a,b) 
    return list(ret) if len(ret)>0 else [np.nan]

df['col3'] = [intersect_nan(a,b) for a,b in zip(df['col1'], df['col2'])]

Output：

                 col1                   col2        col3
0     [how, are, you]   [tell, how, me, you]  [how, you]
1  [im, fine, thanks]           [who, cares]       [nan]
2         [you, know]  [know, this, padewan]      [know]
3               [nan]        [who, are, you]       [nan]

Answer 4

像這樣的東西：

 d =  {'col1': [['how', 'are', 'you'], ['im', 'fine', 'thanks'], ['you', 'know'], [numpy.nan]],
                'col2': [['tell', 'how', 'me', 'you'], ['who', 'cares'], ['know', 'this', 'padewan'],
                      ['who', 'are', 'you']]}
        df = pandas.DataFrame(d)
        list_col3 = []
        for index, row in df.iterrows():
            a_set= set(row['col1'])
            b_set= set(row['col2'])
            if len(a_set.intersection(b_set)) > 0:
                list_col3.append(list(a_set.intersection(b_set)))
            else:
                list_col3.append([numpy.nan])
        df['col3'] = list_col3
        print(df)

Output：

                 col1                   col2        col3
0     [how, are, you]   [tell, how, me, you]  [how, you]
1  [im, fine, thanks]           [who, cares]       [nan]
2         [you, know]  [know, this, padewan]      [know]
3               [nan]        [who, are, you]       [nan]

當一個列表的元素位於 pandas 列中時如何檢查它們是否是另一個列表

問題描述

4 個解決方案

解決方案1
4 2020-05-05 20:42:53

解決方案2
3 已采納 2020-05-05 20:43:40

解決方案3
2 2020-05-05 20:38:31

解決方案4
1 2020-05-05 21:24:40

當一個列表的元素位於 pandas 列中時如何檢查它們是否是另一個列表

問題描述

4 個解決方案

解決方案1 4 2020-05-05 20:42:53

解決方案2 3 已采納 2020-05-05 20:43:40

解決方案3 2 2020-05-05 20:38:31

解決方案4 1 2020-05-05 21:24:40

解決方案1
4 2020-05-05 20:42:53

解決方案2
3 已采納 2020-05-05 20:43:40

解決方案3
2 2020-05-05 20:38:31

解決方案4
1 2020-05-05 21:24:40