简体   繁体   中英

How to read large data set from mongodb to pandas dataframe

I have a large data set which contains data like (9232363X102 and 10 gb file approx) . I have a 12 Gb ram system. How can I read this from pandas and convert as DataFrame. First I tried

df=pd.DataFrame(list(mng_clxn.find({})

It freezes my system

So I tried to read specific columns but still no use, I read like this,

df=pd.DataFrame(list(mng_clxn.find({},{'col1':1,col2:1,'col3':1,col4:1})

another thing I tried was reading as a chunk, for that

df_db=pd.DataFrame()
offset=0
thresh=1000000
while(offset<9232363):

    chunk=pd.DataFrame(list(mng_clxn.find({},).limit(thresh).skip(offset)))
    offset+=thresh
    df_db=df_db.append(chunk)

It's also no use, What should I do now?

Can I solve this problem with my system (12gb Ram)? Any idea would be appreciable.

Feel free to mark as duplicate if you found any other SO questions similar to this.

Thanks in advance.

You'll likely need more memory to handle that dataset in a reasonable way. Consider running step 4 from this question to be sure. You might also consider this question about using pandas with large datasets but generally, you'll probably want more than 2gb of space to use to manipulate the data even if you find a way to load it in.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM