简体   繁体   English

SumIf在Excel中包含大量数据

[英]SumIf with lots of data in Excel

what I'm trying to do is a simple sumif for about 200k lines of data which causes problems for excel. 我想做的是一个简单的sumif,用于处理大约200k行数据,这会导致excel出现问题。 Basically my list looks like this 基本上我的清单是这样的

List of Companies    Dummy1  Dummy2  
Company A             0       1
Company A             0       1
Company A             1       1 
Company B             1       1
Company B             0       1
Company B             0       1
....

and if there is a 1 in any row of column B for a specific company I need to plug a 1 in each row of column C for this company. 如果特定公司的B列的任何行中都有1,则我需要为此公司在C列的每一行中插入1。
So Dummy 2 is basically the sum over Dummy 1 for all entries for a specific company. 因此,对于特定公司的所有条目,虚拟2基本上是虚拟1的总和。 The data is already sorted by column A. Anyway, Excel goes crazy. 数据已经按A列排序。无论如何,Excel都发疯了。 Is it just plain stupid what I'm doing here because I'm generating too many comparative operations? 因为生成过多的比较操作,我在这里所做的事情仅仅是愚蠢的吗? What would be an easy way to accomplish what I'm trying to do here? 什么是完成我要在这里完成的简单方法?

I think the better way to solve this is by ussing pivot table, you can sum Dummy1 by company and get the data as summary. 我认为解决此问题的更好方法是使用数据透视表,您可以按公司汇总Dummy1并将数据作为摘要。

Here is an examples: 这是一个例子:

http://www.excel-easy.com/data-analysis/pivot-tables.html http://www.excel-easy.com/data-analysis/pivot-tables.html

enter link description here 在此处输入链接说明

I hope this help 希望有帮助

According to your sample data, filling C2:C200000 with, 根据您的样本数据,用C2:C200000填充,

=SUMIF(A:A, A2, B:B)

... will be performing 3× as many SUMIF calculations as is necessary. ...将根据需要执行3倍的SUMIF计算。 An IF formula only processes the part that is TRUE or FALSE depending on how the criteria resolves so changing the formula to something like the following, IF公式仅根据条件的解析方式处理TRUE或FALSE的部分,因此将公式更改为以下内容,

=IF(A2<>A1, SUMIF(A:A, A2, B:B), C1)

... should drastically reduce the processing in a calculation cycle. ...应该大大减少计算周期中的处理。 The degree of improvement will depend upon how many duplicate company values are in column A and whether column A has been sorted to keep the company names together. 改善的程度取决于A列中有多少个重复的公司值以及A列是否已排序以使公司名称保持一致。 The smaller the number of unique companies, the more improvement you will see. 独特公司的数量越少,您将看到的改进越多。 In short, unless the company changes from row to row, the SUMIF is not calculated. 简而言之,除非公司逐行更改,否则不会计算SUMIF

Sample Calculation Timing Environment: 样本计算计时环境:

  • Excel 2010 64-bit (14.0.7015.1000) running under Windows 7 Pro on a business class i5 laptop w/8Gbs DRAM. Excel 2010 64位(14.0.7015.1000)在Windows 7 Pro下运行的商务类i5笔记本电脑,带8Gbs DRAM。
  • XLSB; XLSB; Calculation Manual; 计算手册; Recalculate workbook before saving OFF; 保存OFF之前重新计算工作簿; Save AutoRecovery information OFF 关闭保存自动恢复信息

Test 1: 26 companies (Company A to Company Z), each with ~7683 entries in column A, sorted. 测试1:对 26个公司(从A公司到Z公司)进行了排序,每个公司在A列中有〜7683个条目。 Column B random 0's and 1's reverted to values. B列的随机0和1还原为值。 C2:C200000 cleared, worksheet calculated then formula filled in C2:C200000 and new calculation cycle timed to completion. 清除C2:C200000,计算工作表,然后将公式填充到C2:C200000中,新的计算周期定时完成。

 formula                                        calculation cycle (hh:mm:ss)
=SUMIF(A:A, A2, B:B)                                    00:21:44
=IF(A2<>A1, SUMIF(A:A, A2, B:B), C1)                    00:00:09

Test 2: 5000 companies (Company 0001 to Company 5000), each with ~40 entries in column A, sorted. 测试2:对 5000家公司(公司0001至公司5000)进行排序,每个公司在A列中有〜40个条目。 Column B random 0's and 1's reverted to values. B列的随机0和1还原为值。 C2:C200000 cleared, worksheet calculated then formula filled in C2:C200000 and new calculation cycle timed to completion. 清除C2:C200000,计算工作表,然后将公式填充到C2:C200000中,新的计算周期定时完成。

 formula                                        calculation cycle (hh:mm:ss)
=SUMIF(A:A, A2, B:B)                                    00:22:10
=IF(A2<>A1, SUMIF(A:A, A2, B:B), C1)                    00:00:37

计算所用的时间

You cannot magically break the physical laws of time and space but sometimes you can fool them. 您不能神奇地打破时空的物理定律,但有时您可以愚弄它们。 This solution may not be perfect but perhaps it is something that you can live with. 该解决方案可能并不完美,但也许可以接受。

On a related note, large(r) worksheets benefit from having their formulas reverted to result values once calculations have been made if those results are not likely to change on a regular basis. 与此相关的是,如果计算结果不大可能定期更改,则大型(r)工作表将从其公式恢复为结果值中受益。 While Copy, Paste Special, Values is a reasonably quick method of accomplishing this, selecting a large number of cells containing formulas and running the following sub macro is lightning quick. 尽管“ 复制,粘贴特殊值”是一种相当快速的方法,但是选择大量包含公式的单元格并运行下面的子宏则很快。

sub sel_2_Value
    application.enableevents = false
    selection = selection.value
    application.enableevents = true
end sub

If locale differences are not important (currency, dates, etc) then selection = selection.value2 is even better. 如果区域差异不重要(币种,日期等),那么selection = selection.value2甚至更好。

The only thing that will slow down the above operation is formulas with dependents within the range being reverted to values as they will be recalculated. 唯一会减慢上述操作速度的是公式,该公式将范围内的相关项还原为值,因为它们将被重新计算。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM