检查数据框中的 ID 是否存在于另一个数据框中的最快方法

Question

我有大熊猫数据框（大约一百万行）和一个 id-s 列表（数组长度为 100,000）。 对于 df1 中的每个 id，我必须检查该 id 是否在我的列表中（称为special ）并相应地对其进行标记：

df['Segment'] = df['ID'].apply(lambda x: 1 if x in special else np.nan)

问题是这非常慢，因为百万 id-s lambda 表达式检查该 id 是否在 100,000 个条目的列表中。 有没有更快的方法来实现这一点？

Answer 1

 df['Segment'] = df['ID'].isin(special).astype(int)

我们也可以使用Series.view ：

df['Segment'] = df['ID'].isin(special).view('uint8')

df['Segment'] = np.where(df['ID'].isin(special),1 ,0)