简体   繁体   English

当我将 isin 与 Dask 数据帧一起使用时抛出 NotImplementedError

[英]NotImplementedError is thrown when I use isin with Dask data frames

Let's say I have two dask data frames:假设我有两个 dask 数据框:

import dask.dataframe as dd 
import pandas as pd

dd_1 = dd.from_pandas(pd.DataFrame({'a': [1, 2,3], 'b': [6, 7, 8]}), npartitions=1)

dd_2 = dd.from_pandas(pd.DataFrame({'a': [1, 2, 5], 'b': [3, 7, 1]}), npartitions=1)

Now I want to filter the first one using the values of the column in the second one:现在我想使用第二个列的值过滤第一个:

dd_1[dd_1.a.isin(dd_2.a)]

When I try to do this the following error is thrown:当我尝试这样做时,会引发以下错误:

NotImplementedError                       Traceback (most recent call last)
<ipython-input-38-850f035e0842> in <module>
----> 1 dd_1[dd_1.a.isin(dd_2.a)]

/usr/local/lib/python3.7/site-packages/dask/dataframe/core.py in isin(self, values)
   2113     @derived_from(pd.Series)
   2114     def isin(self, values):
-> 2115         return elemwise(M.isin, self, list(values))
   2116 
   2117     @insert_meta_param_description(pad=12)

/usr/local/lib/python3.7/site-packages/dask/dataframe/core.py in __getitem__(self, key)
   2045             graph = HighLevelGraph.from_collections(name, dsk, dependencies=[self, key])
   2046             return Series(graph, name, self._meta, self.divisions)
-> 2047         raise NotImplementedError()
   2048 
   2049     @derived_from(pd.DataFrame)

NotImplementedError: 

Any suggestion?有什么建议吗?

使用最新版本的dask (2.9.1)我个人的解决方法是将第二个series (在您的情况下为 dd_2.a)转换为pandas

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM