![](/img/trans.png)
[英]MemoryError: Unable to allocate 43.5 GiB for an array with shape (5844379795,) and data type int64
[英]Unable to allocate 208. GiB for an array with shape (27939587241,) and data type int64?
这是我的代码:
play_count_with_title = pd.merge(df_count, df_small[['song_id', 'title', 'release']], on = 'song_id' )
final_ratings = pd.merge(play_count_with_title, df_small[['song_id', 'artist_name']], on = 'song_id' )
final_ratings
我得到的错误是
Unable to allocate 208. GiB for an array with shape (27939587241,) and data type int64
在库中启用此错误的代码是
File ~\anaconda3\lib\site-packages\pandas\core\reshape\merge.py:124, in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
93 @Substitution("\nleft : DataFrame or named Series")
94 @Appender(_merge_doc, indents=0)
95 def merge(
(...)
108 validate: str | None = None,
109 ) -> DataFrame:
110 op = _MergeOperation(
111 left,
112 right,
(...)
122 validate=validate,
123 )
--> 124 return op.get_result(copy=copy)
File ~\anaconda3\lib\site-packages\pandas\core\reshape\merge.py:773, in _MergeOperation.get_result(self, copy)
770 if self.indicator:
771 self.left, self.right = self._indicator_pre_merge(self.left, self.right)
--> 773 join_index, left_indexer, right_indexer = self._get_join_info()
775 result = self._reindex_and_concat(
776 join_index, left_indexer, right_indexer, copy=copy
777 )
778 result = result.__finalize__(self, method=self._merge_type)
File ~\anaconda3\lib\site-packages\pandas\core\reshape\merge.py:1026, in _MergeOperation._get_join_info(self)
1022 join_index, right_indexer, left_indexer = _left_join_on_index(
1023 right_ax, left_ax, self.right_join_keys, sort=self.sort
1024 )
1025 else:
-> 1026 (left_indexer, right_indexer) = self._get_join_indexers()
1028 if self.right_index:
1029 if len(self.left) > 0:
File ~\anaconda3\lib\site-packages\pandas\core\reshape\merge.py:1000, in _MergeOperation._get_join_indexers(self)
998 def _get_join_indexers(self) -> tuple[npt.NDArray[np.intp], npt.NDArray[np.intp]]:
999 """return the join indexers"""
-> 1000 return get_join_indexers(
1001 self.left_join_keys, self.right_join_keys, sort=self.sort, how=self.how
1002 )
File ~\anaconda3\lib\site-packages\pandas\core\reshape\merge.py:1610, in get_join_indexers(left_keys, right_keys, sort, how, **kwargs)
1600 join_func = {
1601 "inner": libjoin.inner_join,
1602 "left": libjoin.left_outer_join,
(...)
1606 "outer": libjoin.full_outer_join,
1607 }[how]
1609 # error: Cannot call function of unknown type
-> 1610 return join_func(lkey, rkey, count, **kwargs)
File ~\anaconda3\lib\site-packages\pandas\_libs\join.pyx:48, in pandas._libs.join.inner_join()
作为初学者,我不明白这个错误,你们能帮帮我吗?
如果没有数据样本,很难知道发生了什么。 但是,如果两个数据框中有很多重复值,这看起来像是您会看到的那种问题。
请注意,如果在合并期间有多行匹配,则合并会发出左右行的每个组合。
例如,这是一个 3 元素 DataFrame 与自身合并的小例子。 结果有9个元素!
In [7]: df = pd.DataFrame({'a': [1,1,1], 'b': [1,2,3]})
In [8]: df.merge(df, 'left', on='a')
Out[8]:
a b_x b_y
0 1 1 1
1 1 1 2
2 1 1 3
3 1 2 1
4 1 2 2
5 1 2 3
6 1 3 1
7 1 3 2
8 1 3 3
如果您的song_id
列中有很多重复项,那么元素的数量可能多达 N^2,即154377**2 == 23832258129
在最坏的情况下。
尝试在每个合并输入上使用drop_duplicates('song_id')
以查看在这种情况下会发生什么。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.