[英]Pandas - Check if all elemets in a list columns are in another list
[英]How to check if an element of one list is another when they are in pandas columns
給定一個 dataframe
d = {'col1': [['how', 'are', 'you'], ['im', 'fine', 'thanks'], ['you', 'know'], [np.nan]],
'col2': [['tell', 'how', 'me', 'you'], ['who', 'cares'], ['know', 'this', 'padewan'], ['who', 'are', 'you']]
df = pd.DataFrame(data=d)
我想創建第三列col3
,它是col2
列表中的任何元素,它包含在col1
列表中相應行的列表中,否則np.nan
。
它必須采用任何匹配的元素。
在這種情況下, col3
將是:
col1 col2 col3
0 ['how', 'are', 'you'] ['tell', 'how, 'me', 'you'] ['how', 'you']
1 ['im', 'fine', 'thanks'] ['who', 'cares'] [np.nan]
2 ['you', 'know'] ['know', 'this', 'padewan'] ['know']
3 [np.nan] ['who', 'are', 'you'] [np.nan]
我試過了
df['col3'] = [c in l for c, l in zip(df['col1'], df['col2'])]
這根本不起作用,所以任何想法都會非常有幫助。
像這樣的東西:
df['col3'] = [list(set(a).intersection(b)) for a, b in zip(df.col1, df.col2)]
Output:
col1 col2 col3
0 [how, are, you] [tell, how, me, you] [you, how]
1 [im, fine, thanks] [who, cares] []
2 [you, know] [know, this, padewan] [know]
3 [nan] [who, are, you] []
另一個版本:
df['col3'] = df.apply(lambda x: [*set(x['col1']).intersection(x['col2'])] or [np.nan], axis=1 )
print(df)
印刷:
col1 col2 col3
0 [how, are, you] [tell, how, me, you] [how, you]
1 [im, fine, thanks] [who, cares] [nan]
2 [you, know] [know, this, padewan] [know]
3 [nan] [who, are, you] [nan]
我會在 np.intersect1d 的幫助下編寫一個單獨的np.intersect1d
並應用:
def intersect_nan(a,b):
ret = np.intersect1d(a,b)
return list(ret) if len(ret)>0 else [np.nan]
df['col3'] = [intersect_nan(a,b) for a,b in zip(df['col1'], df['col2'])]
Output:
col1 col2 col3
0 [how, are, you] [tell, how, me, you] [how, you]
1 [im, fine, thanks] [who, cares] [nan]
2 [you, know] [know, this, padewan] [know]
3 [nan] [who, are, you] [nan]
像這樣的東西:
d = {'col1': [['how', 'are', 'you'], ['im', 'fine', 'thanks'], ['you', 'know'], [numpy.nan]],
'col2': [['tell', 'how', 'me', 'you'], ['who', 'cares'], ['know', 'this', 'padewan'],
['who', 'are', 'you']]}
df = pandas.DataFrame(d)
list_col3 = []
for index, row in df.iterrows():
a_set= set(row['col1'])
b_set= set(row['col2'])
if len(a_set.intersection(b_set)) > 0:
list_col3.append(list(a_set.intersection(b_set)))
else:
list_col3.append([numpy.nan])
df['col3'] = list_col3
print(df)
Output:
col1 col2 col3
0 [how, are, you] [tell, how, me, you] [how, you]
1 [im, fine, thanks] [who, cares] [nan]
2 [you, know] [know, this, padewan] [know]
3 [nan] [who, are, you] [nan]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.