[英]In Excel, remove duplicates from one column based on the values in another column, either through VBA or a combination of formulas/functions
I'm having trouble trying to achieve this in an accurate and automated way. 我在尝试以准确和自动化的方式实现此目标时遇到了麻烦。 I've tried the approaches discussed here , here and here , but none work in my scenario.
我已经尝试过这里 , 这里和这里讨论的方法,但是在我的方案中没有任何工作。
I have a spreadsheet with thousands of rows of data. 我有一个包含数千行数据的电子表格。 Data is organised as follows:
数据组织如下:
This data contains a number of duplicates I need to remove, based on the IP address in Column A. However, the criteria I need is to remove whichever duplicates are not the longest duration. 该数据包含大量重复我需要基于列A.然而,IP地址删除的,我需要的标准是消除重复为准非最长持续时间。 To better explain my scenario, see sample image below:
为了更好地解释我的情况,请参见下面的示例图片:
I need a way to remove all duplicates of a particular IP address that do not contain the longest duration for that IP address. 我需要一种方法来删除不包含该IP地址最长持续时间的特定IP地址的所有重复项。 So, using the above example, row 3 would be deleted because the duration of 1 minute is shorter than 36 minutes in row 4 that contains the same IP address.
因此,使用上面的示例,将删除第3行,因为1分钟的持续时间比包含相同IP地址的第4行中的36分钟短。
Another example is that rows 5, 6 and 7 would also be removed as all their durations are shorter than row 8 which has the same IP address but a longer duration. 另一个示例是,第5、6和7行也将被删除,因为它们的持续时间都比具有相同IP地址但持续时间更长的第8行短。 Of course, any rows already containing unique IP addresses would be left alone.
当然,任何已经包含唯一IP地址的行都将被保留。 The end result using my above sample would be as follows:
使用我上面的示例的最终结果如下:
Of course, in my sample above all the data was nicely sorted by IP address first and Duration second. 当然,在我上面的示例中,所有数据均按IP地址排在首位,然后将Duration排在第二位。 In real life this isn't the case, but that's something easy enough for me to do prior to any solution, if necessary.
在现实生活中并非如此,但是如果需要的话,对于任何解决方案,这对于我来说都是一件容易的事。
The key thing is that in some cases an IP address may be duplicated once, in others it may be duplicated many times over. 关键是,在某些情况下,一个IP地址可能重复一次,在其他情况下,可能重复多次。 I just need to ensure that only the one with the longest duration remains.
我只需要确保只保留时间最长的那个即可。 In the event that multiple instances of an IP address has the same longest duration, then I want them all kept.
如果一个IP地址的多个实例具有最长的持续时间,那么我希望将它们全部保留。 That is, if an IP address is repeated ten times and its longest duration is an hour for two of those times, then both of them need to remain.
也就是说,如果一个IP地址重复十次,并且最长的持续时间是其中两次的一小时,那么这两个都需要保留。
I'm happy with any solution for this, be it using formulas, functions or macros. 我对使用公式,函数或宏的任何解决方案感到满意。
You can solve your task using the helper column (column D). 您可以使用帮助程序列(D列)解决任务。
Insert the following array formula to the cell D2: 将以下数组公式插入单元格D2:
=IF($C2=MAX(IF($A2=$A$2:$A$50,$C$2:$C$50,-1)),"Remain","Remove")
where 50 - the last row of your table 50-表格的最后一行
Remember to press Ctrl+Shift+Enter
to complete the array formula correctly. 请记住按
Ctrl+Shift+Enter
正确完成数组公式。
Copy/paste the formula to the other cells. 将公式复制/粘贴到其他单元格。
Аpply filter to column D by "remove" value 将“删除”值过滤到D列
Delete filtered rows. 删除过滤的行。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.