简体   繁体   English

如何检查一个数据帧中的列值是否可用或不检查另一数据帧的列中的值?

[英]How to check values of column in one dataframe available or not in column of another dataframe?

I have two dataframes- 我有两个数据框-

df1_data = {'sym1' :{0:'abc a01',1:'pqr q02',2:'xyz y03',3:'mno o12',4:'lmn l45'}}
df1 = pd.DataFrame(df1_data)
print df1

df2_data = {'sym2' :{0:'abc a01',1:'xxx p0',2:'xyz y03',3:'mno o12',4:'lmn l45',5:'rrr r1',6:'kkk k3'}}
df2 = pd.DataFrame(df2_data)
print df2

output- 输出-

      sym1
0  abc a01
1  pqr q02
2  xyz y03
3  mno o12
4  lmn l45
      sym2
0  abc a01
1   xxx p0
2  xyz y03
3  mno o12
4  lmn l45
5   rrr r1
6   kkk k3

I want to check sym2 column values available or not in df2 dataframes sym1 column. 我想检查df2数据框sym1列中的sym2列值是否可用。 If symbols in sym2 column are not available then I want list of that symbols which are not available in sym1 column. 如果sym2列中的符号不​​可用,那么我想要sym1列中不可用的那些符号的列表。 If all symbols are available then list must be empty. 如果所有符号均可用,则列表必须为空。

Expected Result- 预期结果-

list -> ['xxx p0','rrr r1','kkk k3']

You can use boolean indexing with isin , then select by ix and convert to list by tolist : 您可以使用boolean indexingisin ,然后选择ix ,并转换为listtolist

print (~df2.sym2.isin(df1.sym1))
0    False
1     True
2    False
3    False
4    False
5     True
6     True
Name: sym2, dtype: bool

print (df2.ix[~df2.sym2.isin(df1.sym1), 'sym2'])
1    xxx p0
5    rrr r1
6    kkk k3
Name: sym2, dtype: object

print (df2.ix[~df2.sym2.isin(df1.sym1), 'sym2'].tolist())
['xxx p0', 'rrr r1', 'kkk k3']

Here is another, bit faster, solution: 这是另一个更快的解决方案:

In [54]: df2.set_index('sym2').index.difference(df1.set_index('sym1').index).values
Out[54]: array(['kkk k3', 'rrr r1', 'xxx p0'], dtype=object)

or as vanilla Python list: 或作为普通的Python列表:

In [74]: df2.set_index('sym2').index.difference(df1.set_index('sym1').index).values.tolist()
Out[74]: ['kkk k3', 'rrr r1', 'xxx p0']

Timings for 700K and 500K DFs: 700K和500K DF的时序:

In [55]: df1 = pd.concat([df1] * 10**5, ignore_index=True)

In [57]: df2 = pd.concat([df2] * 10**5, ignore_index=True)

In [58]: df1.shape
Out[58]: (500000, 1)

In [59]: df2.shape
Out[59]: (700000, 1)

In [67]: %timeit df2.set_index('sym2').index.difference(df1.set_index('sym1').index).values
10 loops, best of 3: 123 ms per loop

In [68]: %timeit df2.ix[~df2.sym2.isin(df1.sym1), 'sym2']
1 loop, best of 3: 216 ms per loop

In [72]: %timeit df2.set_index('sym2').index.difference(df1.set_index('sym1').index).values.tolist()
10 loops, best of 3: 123 ms per loop

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何检查数据框中的一列是否与另一个数据框中的一列完全相等 - How to check if one column in a dataframe is exactly equal to a column in another dataframe 如何将一个数据帧的列值附加到另一个数据帧的列 - How to append column values of one dataframe to column of another dataframe 如何比较另一列数据框中的一个可用列值是否可用以及如何提取第二个数据帧中的另一列(如果存在) - How to compare one column value available or not in another column dataframe and extract another column of second dataframe if present 检查一个数据框值是否与另一个数据框列匹配,然后在数据框列中设置值 - check if one dataframe values match another dataframe column then set value in dataframe column 比较另一列 dataframe 中一列的值 dataframe - Compare values of one column of dataframe in another dataframe 如果索引值相同,如何将一个DataFrame列复制到另一个Dataframe中 - How to copy one DataFrame column in to another Dataframe if their indexes values are the same 如何在其他四个数据框的列中检查一个或哪些数据帧列可用? - How to check one dataframe column available or not in other four dataframe's column? Python pandas 数据框检查一列的值是否在另一个列表中 - Python pandas dataframe check if values of one column is in another list 检查一列中的值是否存在于另一数据框中的多列中 - Check if values from one column, exists in multiple columns in another dataframe 根据另一个 dataframe 的列值打印一个 dataframe 的列值 - print column values of one dataframe based on the column values of another dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM