[英]How to check if one column in a dataframe is exactly equal to a column in another dataframe
[英]How to check values of column in one dataframe available or not in column of another dataframe?
我有兩個數據框-
df1_data = {'sym1' :{0:'abc a01',1:'pqr q02',2:'xyz y03',3:'mno o12',4:'lmn l45'}}
df1 = pd.DataFrame(df1_data)
print df1
df2_data = {'sym2' :{0:'abc a01',1:'xxx p0',2:'xyz y03',3:'mno o12',4:'lmn l45',5:'rrr r1',6:'kkk k3'}}
df2 = pd.DataFrame(df2_data)
print df2
輸出-
sym1
0 abc a01
1 pqr q02
2 xyz y03
3 mno o12
4 lmn l45
sym2
0 abc a01
1 xxx p0
2 xyz y03
3 mno o12
4 lmn l45
5 rrr r1
6 kkk k3
我想檢查df2數據框sym1列中的sym2列值是否可用。 如果sym2列中的符號不可用,那么我想要sym1列中不可用的那些符號的列表。 如果所有符號均可用,則列表必須為空。
預期結果-
list -> ['xxx p0','rrr r1','kkk k3']
您可以使用boolean indexing
與isin
,然后選擇ix
,並轉換為list
的tolist
:
print (~df2.sym2.isin(df1.sym1))
0 False
1 True
2 False
3 False
4 False
5 True
6 True
Name: sym2, dtype: bool
print (df2.ix[~df2.sym2.isin(df1.sym1), 'sym2'])
1 xxx p0
5 rrr r1
6 kkk k3
Name: sym2, dtype: object
print (df2.ix[~df2.sym2.isin(df1.sym1), 'sym2'].tolist())
['xxx p0', 'rrr r1', 'kkk k3']
這是另一個更快的解決方案:
In [54]: df2.set_index('sym2').index.difference(df1.set_index('sym1').index).values
Out[54]: array(['kkk k3', 'rrr r1', 'xxx p0'], dtype=object)
或作為普通的Python列表:
In [74]: df2.set_index('sym2').index.difference(df1.set_index('sym1').index).values.tolist()
Out[74]: ['kkk k3', 'rrr r1', 'xxx p0']
700K和500K DF的時序:
In [55]: df1 = pd.concat([df1] * 10**5, ignore_index=True)
In [57]: df2 = pd.concat([df2] * 10**5, ignore_index=True)
In [58]: df1.shape
Out[58]: (500000, 1)
In [59]: df2.shape
Out[59]: (700000, 1)
In [67]: %timeit df2.set_index('sym2').index.difference(df1.set_index('sym1').index).values
10 loops, best of 3: 123 ms per loop
In [68]: %timeit df2.ix[~df2.sym2.isin(df1.sym1), 'sym2']
1 loop, best of 3: 216 ms per loop
In [72]: %timeit df2.set_index('sym2').index.difference(df1.set_index('sym1').index).values.tolist()
10 loops, best of 3: 123 ms per loop
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.