简体   繁体   English

如何使用非常大的数据优化COUNTIFS

[英]How to optimize COUNTIFS with very large data

I would like to create a report that look like this picture below. 我想创建一个如下图所示的报告。
My data has around 500,000 cells (it will continue to grow larger) 我的数据大约有500,000个单元(它将继续增长)

Right now, I'm using countifs function from excel but it takes a very long time to calculate. 现在,我正在使用excel的countifs函数,但计算时间很长。 (cannot turnoff automatic calculate) (无法关闭自动计算)

The main value is collected as date and the range of date is about 3 years, so I have to put a lot of formula to cover all range of value. 主要值是作为日期收集的,日期范围大约为3年,因此我必须投入很多公式来涵盖所有值范围。

结果 result 结果

The picture below is the datasource the top one cannot be changed. 下图是数据源,最上面的一个不能更改。 , while the bottom is the one I created by myself (can change). ,而底部是我自己创建的底部(可以更改)。 I use weeknum to change date to week number. 我使用weeknum将日期更改为星期数。

数据 data 数据

Are there any better formula or any ways to make this file faster? 有没有更好的公式或任何方法可以使此文件更快? Every kinds of suggestions are welcome! 欢迎各种建议!
I was thinking about using Pivot Table, but I don't know how to make pivot table from this kind of datasource. 我当时正在考虑使用数据透视表,但是我不知道如何从这种数据源制作数据透视表。
PS. PS。 VBA is the last option. VBA是最后一个选择。

You can download example file here: https://www.mediafire.com/?t21s8ngn9mlme2d 您可以在此处下载示例文件: https : //www.mediafire.com/?t21s8ngn9mlme2d

I will post this answer with the disclaimer that it is entirely dependent on the size of the data set. 我将在免责声明中发布该答案,因为它完全取决于数据集的大小。 That turning on and off the auto calculate is the best way, but your question doesn't let me do that, so keep reading. 开启和关闭自动计算是最好的方法,但是您的问题不允许我这样做,所以请继续阅读。

Your question made me curious, so I gave it a try and timed it. 您的问题使我感到好奇,因此我尝试了一下并定时进行。 I essentially set up two columns of over 100,000 rand numbers choosing from 1-1000 and then tried to do a countif on the two columns if they were equal. 实际上,我设置了两列,每列超过100,000兰特,值从1-1000中选择,然后尝试对两列相等进行计数。 I made a macro that I can run that turns off the autocalculate, inserts the start time, calculates, and then inserts the finish time. 我制作了一个可以运行的宏,该宏可以关闭自动计算功能,插入开始时间,进行计算,然后插入结束时间。 I highlighted in yellow the time difference. 我用黄色突出显示了时差。

First I tried your way, two criteria, countifs: 首先,我尝试了两种方法,即条件: 在此处输入图片说明

Then I tried to combine (concatenate) the two columns to see if I could make it easier by only having one countif criteria and data set. 然后,我尝试合并(连接)这两列,以查看是否只有一个统计条件标准和数据集可以使它更容易。 It doesn't. 没有。 see result below: 见下面的结果: 在此处输入图片说明

Finally, realizing what was going on. 最后,了解发生了什么。 I decided to make the criteria only match the FIRST value in the number to look for. 我决定使条件仅与要查找的数字中的FIRST值匹配。 I was essentially reducing the number of characters to check per cell. 我实质上是在减少每个单元格要检查的字符数。 This had a positive result. 这产生了积极的结果。 See below: 见下文: 在此处输入图片说明

Therefore my suggestion is to limit the length of the words you are comparing in anyway possible. 因此,我的建议是无论如何都应限制所比较单词的长度。 You are mostly looking at dates, so you might have to get creative, but this seems to be the best way possible without going to manual calculation. 您主要在查看日期,因此您可能需要发挥创意,但这似乎是最好的方法,无需进行手动计算。

I have worked with Excel sheets of a similar size. 我曾使用过类似尺寸的Excel工作表。 Especially if you are using the data on a regular basis, I would heartily recommend switching to a proper database SQL based, Access, or whatever fits your purpose. 尤其是如果您定期使用数据,我衷心建议您切换到基于SQL,Access或适合您目的的适当数据库。 I does wonders for the speed and also you won't run into the size limits of Excel. 我对速度感到惊讶,而且您也不会遇到Excel的大小限制。 :-) :-)

You can import the data you have now fairly easy. 您现在可以轻松导入数据。 I am happy as a clam with my postgresql db. 作为我的PostgreSQL数据库的蛤a,我感到很高兴。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM