[英]Python MemoryError: Unable to allocate 10.8 TiB for an array with shape () and data type int64
I am trying to combine two data sets.我正在尝试合并两个数据集。 Using codes as follows使用代码如下
pd1=pd.read_csv('path1') # 1456472 rows x 17 columns
pd2=pd.read_csv('path2') # 1083899 rows x 42 columns
pd=pd.merge(left=pd1,right=pd2,how='left',on='id')
It returns with error:它返回错误:
MemoryError: Unable to allocate 10.8 TiB for an array with shape (1483050607760,) and data type int64 MemoryError:无法为形状为 (1483050607760,) 且数据类型为 int64 的数组分配 10.8 TiB
How can I solve this if my laptop is a 500GB+8GB one?如果我的笔记本电脑是 500GB+8GB,我该如何解决这个问题? Thank you in advance.先感谢您。
try dask then you can convert it to pandas if you want to on other machine尝试 dask 然后你可以将它转换为熊猫,如果你想在其他机器上
import dask.dataframe as dd
#pip install "dask[dataframe]"
dd1=dd.read_csv('path1')
dd2=dd.read_csv('path2')
dd=dd.merge(left=dd1,right=dd2,how='left',on='id')
Use this new way to convert sparse matrix directly to dataframe:使用这种新方法将稀疏矩阵直接转换为 dataframe:
df = pd.DataFrame.sparse.from_spmatrix(vectorized_text)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.