简体   繁体   English

如何将大数据集从MongoDB读取到Pandas DataFrame

[英]How to read large data set from mongodb to pandas dataframe

I have a large data set which contains data like (9232363X102 and 10 gb file approx) . 我有一个大型数据集,其中包含类似(9232363X102和10 gb的文件大约)的数据。 I have a 12 Gb ram system. 我有一个12 Gb的ram系统。 How can I read this from pandas and convert as DataFrame. 我如何从熊猫中读取并转换为DataFrame。 First I tried 首先我尝试

df=pd.DataFrame(list(mng_clxn.find({})

It freezes my system 它冻结了我的系统

So I tried to read specific columns but still no use, I read like this, 所以我尝试读取特定的列,但仍然没有用,我这样读,

df=pd.DataFrame(list(mng_clxn.find({},{'col1':1,col2:1,'col3':1,col4:1})

another thing I tried was reading as a chunk, for that 我尝试的另一件事是阅读,因为

df_db=pd.DataFrame()
offset=0
thresh=1000000
while(offset<9232363):

    chunk=pd.DataFrame(list(mng_clxn.find({},).limit(thresh).skip(offset)))
    offset+=thresh
    df_db=df_db.append(chunk)

It's also no use, What should I do now? 这也没有用,我现在该怎么办?

Can I solve this problem with my system (12gb Ram)? 我可以用我的系统(12gb Ram)解决此问题吗? Any idea would be appreciable. 任何想法都是可取的。

Feel free to mark as duplicate if you found any other SO questions similar to this. 如果您发现任何其他与此类似的问题,请随时将其标记为重复。

Thanks in advance. 提前致谢。

You'll likely need more memory to handle that dataset in a reasonable way. 您可能需要更多的内存才能以合理的方式处理该数据集。 Consider running step 4 from this question to be sure. 请确定从这个问题开始执行步骤4。 You might also consider this question about using pandas with large datasets but generally, you'll probably want more than 2gb of space to use to manipulate the data even if you find a way to load it in. 您可能还会考虑有关将熊猫与大型数据集一起使用的问题 ,但通常来说,即使您找到一种加载方式,也可能需要超过2gb的空间来处理数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM