![](/img/trans.png)
[英]MemoryError: Unable to allocate 43.5 GiB for an array with shape (5844379795,) and data type int64
[英]Unable to allocate 208. GiB for an array with shape (27939587241,) and data type int64?
這是我的代碼:
play_count_with_title = pd.merge(df_count, df_small[['song_id', 'title', 'release']], on = 'song_id' )
final_ratings = pd.merge(play_count_with_title, df_small[['song_id', 'artist_name']], on = 'song_id' )
final_ratings
我得到的錯誤是
Unable to allocate 208. GiB for an array with shape (27939587241,) and data type int64
在庫中啟用此錯誤的代碼是
File ~\anaconda3\lib\site-packages\pandas\core\reshape\merge.py:124, in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
93 @Substitution("\nleft : DataFrame or named Series")
94 @Appender(_merge_doc, indents=0)
95 def merge(
(...)
108 validate: str | None = None,
109 ) -> DataFrame:
110 op = _MergeOperation(
111 left,
112 right,
(...)
122 validate=validate,
123 )
--> 124 return op.get_result(copy=copy)
File ~\anaconda3\lib\site-packages\pandas\core\reshape\merge.py:773, in _MergeOperation.get_result(self, copy)
770 if self.indicator:
771 self.left, self.right = self._indicator_pre_merge(self.left, self.right)
--> 773 join_index, left_indexer, right_indexer = self._get_join_info()
775 result = self._reindex_and_concat(
776 join_index, left_indexer, right_indexer, copy=copy
777 )
778 result = result.__finalize__(self, method=self._merge_type)
File ~\anaconda3\lib\site-packages\pandas\core\reshape\merge.py:1026, in _MergeOperation._get_join_info(self)
1022 join_index, right_indexer, left_indexer = _left_join_on_index(
1023 right_ax, left_ax, self.right_join_keys, sort=self.sort
1024 )
1025 else:
-> 1026 (left_indexer, right_indexer) = self._get_join_indexers()
1028 if self.right_index:
1029 if len(self.left) > 0:
File ~\anaconda3\lib\site-packages\pandas\core\reshape\merge.py:1000, in _MergeOperation._get_join_indexers(self)
998 def _get_join_indexers(self) -> tuple[npt.NDArray[np.intp], npt.NDArray[np.intp]]:
999 """return the join indexers"""
-> 1000 return get_join_indexers(
1001 self.left_join_keys, self.right_join_keys, sort=self.sort, how=self.how
1002 )
File ~\anaconda3\lib\site-packages\pandas\core\reshape\merge.py:1610, in get_join_indexers(left_keys, right_keys, sort, how, **kwargs)
1600 join_func = {
1601 "inner": libjoin.inner_join,
1602 "left": libjoin.left_outer_join,
(...)
1606 "outer": libjoin.full_outer_join,
1607 }[how]
1609 # error: Cannot call function of unknown type
-> 1610 return join_func(lkey, rkey, count, **kwargs)
File ~\anaconda3\lib\site-packages\pandas\_libs\join.pyx:48, in pandas._libs.join.inner_join()
作為初學者,我不明白這個錯誤,你們能幫幫我嗎?
如果沒有數據樣本,很難知道發生了什么。 但是,如果兩個數據框中有很多重復值,這看起來像是您會看到的那種問題。
請注意,如果在合並期間有多行匹配,則合並會發出左右行的每個組合。
例如,這是一個 3 元素 DataFrame 與自身合並的小例子。 結果有9個元素!
In [7]: df = pd.DataFrame({'a': [1,1,1], 'b': [1,2,3]})
In [8]: df.merge(df, 'left', on='a')
Out[8]:
a b_x b_y
0 1 1 1
1 1 1 2
2 1 1 3
3 1 2 1
4 1 2 2
5 1 2 3
6 1 3 1
7 1 3 2
8 1 3 3
如果您的song_id
列中有很多重復項,那么元素的數量可能多達 N^2,即154377**2 == 23832258129
在最壞的情況下。
嘗試在每個合並輸入上使用drop_duplicates('song_id')
以查看在這種情況下會發生什么。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.