繁体   English   中英

无法为形状为 (27939587241,) 且数据类型为 int64 的数组分配 208.GiB?

[英]Unable to allocate 208. GiB for an array with shape (27939587241,) and data type int64?

这是我的代码:

play_count_with_title = pd.merge(df_count, df_small[['song_id', 'title', 'release']], on = 'song_id' )

final_ratings  = pd.merge(play_count_with_title, df_small[['song_id', 'artist_name']], on = 'song_id' )

final_ratings

我得到的错误是

Unable to allocate 208. GiB for an array with shape (27939587241,) and data type int64

在库中启用此错误的代码是

File ~\anaconda3\lib\site-packages\pandas\core\reshape\merge.py:124, in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
     93 @Substitution("\nleft : DataFrame or named Series")
     94 @Appender(_merge_doc, indents=0)
     95 def merge(
   (...)
    108     validate: str | None = None,
    109 ) -> DataFrame:
    110     op = _MergeOperation(
    111         left,
    112         right,
   (...)
    122         validate=validate,
    123     )
--> 124     return op.get_result(copy=copy)

File ~\anaconda3\lib\site-packages\pandas\core\reshape\merge.py:773, in _MergeOperation.get_result(self, copy)
    770 if self.indicator:
    771     self.left, self.right = self._indicator_pre_merge(self.left, self.right)
--> 773 join_index, left_indexer, right_indexer = self._get_join_info()
    775 result = self._reindex_and_concat(
    776     join_index, left_indexer, right_indexer, copy=copy
    777 )
    778 result = result.__finalize__(self, method=self._merge_type)

File ~\anaconda3\lib\site-packages\pandas\core\reshape\merge.py:1026, in _MergeOperation._get_join_info(self)
   1022     join_index, right_indexer, left_indexer = _left_join_on_index(
   1023         right_ax, left_ax, self.right_join_keys, sort=self.sort
   1024     )
   1025 else:
-> 1026     (left_indexer, right_indexer) = self._get_join_indexers()
   1028     if self.right_index:
   1029         if len(self.left) > 0:

File ~\anaconda3\lib\site-packages\pandas\core\reshape\merge.py:1000, in _MergeOperation._get_join_indexers(self)
    998 def _get_join_indexers(self) -> tuple[npt.NDArray[np.intp], npt.NDArray[np.intp]]:
    999     """return the join indexers"""
-> 1000     return get_join_indexers(
   1001         self.left_join_keys, self.right_join_keys, sort=self.sort, how=self.how
   1002     )

File ~\anaconda3\lib\site-packages\pandas\core\reshape\merge.py:1610, in get_join_indexers(left_keys, right_keys, sort, how, **kwargs)
   1600 join_func = {
   1601     "inner": libjoin.inner_join,
   1602     "left": libjoin.left_outer_join,
   (...)
   1606     "outer": libjoin.full_outer_join,
   1607 }[how]
   1609 # error: Cannot call function of unknown type
-> 1610 return join_func(lkey, rkey, count, **kwargs)

File ~\anaconda3\lib\site-packages\pandas\_libs\join.pyx:48, in pandas._libs.join.inner_join()

作为初学者,我不明白这个错误,你们能帮帮我吗?

如果没有数据样本,很难知道发生了什么。 但是,如果两个数据框中有很多重复值,这看起来像是您会看到的那种问题。

请注意,如果在合并期间有多行匹配,则合并会发出左右行的每个组合。

例如,这是一个 3 元素 DataFrame 与自身合并的小例子。 结果有9个元素!

In [7]: df = pd.DataFrame({'a': [1,1,1], 'b': [1,2,3]})

In [8]: df.merge(df, 'left', on='a')
Out[8]:
   a  b_x  b_y
0  1    1    1
1  1    1    2
2  1    1    3
3  1    2    1
4  1    2    2
5  1    2    3
6  1    3    1
7  1    3    2
8  1    3    3

如果您的song_id列中有很多重复项,那么元素的数量可能多达 N^2,即154377**2 == 23832258129在最坏的情况下。

尝试在每个合并输入上使用drop_duplicates('song_id')以查看在这种情况下会发生什么。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM