简体   繁体   中英

How can I work with a huge csv file that does not fit in the RAM with a combination of Power Query and Power Pivot on Excel 2010

I wonder if someone can help me with the following problem. First off my setup (which I can't change because it is a corporate environment)

  • Operating system: Windows 7 Professional. Service Pack 1. 32bit
  • Hardware: 8.00 GB RAM (2.73 Gb usable)

I am looking for a solution for slicing and dicing really big files (around 5Gb) with Excel. So the equivalent of being able to use pivot tables and graphs with so much data.

I just got Power Query and Power Pivot installed on my laptop (remember I'm running Excel 2010 - 32bit) and saved the huge .csv file as a connection with Power Query. However, I can't add it to my data model and use it from Power Pivot (apparently that is a problem in Excel 2010). I tried to get around by clicking on Power Pivot -> existing connections. But then it tries to import everything and I run out of memory or hit some Excel limit.

To me, the idea should be that the data is never loaded, that it is kept as a connection (where you only store the query) and that data is loaded "lazily" and only what you need after you set up the Power Pivot report (otherwise I don't know how Power Query and Power Pivot help to work with big files that wouldn't fit in Excel otherwise).

What can I do to add the connection to the huge file to the data model so that I can continue working until I can set up a report (with Power Pivot) and see the results?

If there is a software package I am missing (such as Power BI), that would help me fix the problem please let me know. If it's free (like Power Query and Power Pivot) I could have it installed.

Thank you very much in advance and regards

Since it sounds like the loading the CSV directly into power pivot failed due to importing everything. One option would be loading the CSV into an intermediary database like an access file (or SQL server Express) while cleaning up the data to improve memory usage.

Note: power pivot uses more memory when it is updating the model.

Link to Creating an memory efficient model which has some tips on how to design power pivot models to be more memory efficient.

Also note that uniqueness of data drastically effects how much data can fit in memory. Columns with less distinct values consume less memory, something that is unique like a row_ID on a fact table would consume a ton of memory.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM