[英]How to amend a column based on duplicates in another and leave a unique value in Excel
I have a spreadsheet which has a lot of duplicates I need to cleanse but need to ensure the right data in another column is kept.我有一个电子表格,其中有很多我需要清理的重复项,但需要确保保留另一列中的正确数据。
Data and desired outcome数据和预期结果
Essentially in Column E there are duplicate values but these values could be duplicated any number of times, it is not the same amount each time.本质上在 E 列中存在重复值,但这些值可以重复任意次数,每次都不相同。
In Column D for each record there should be either an A or B or blank.在每条记录的 D 列中,应该有一个 A 或 B 或空白。
Now the trouble is some duplicate sets have different values in column D. I need a way to remove all the duplicates from column E ensuring that each row in column E is unique while still ensuring the right value is kept from column D.现在的问题是一些重复的集合在 D 列中有不同的值。 我需要一种方法来从 E 列中删除所有重复项,以确保 E 列中的每一行都是唯一的,同时仍然确保从 D 列中保留正确的值。
There are currently 3 different results in the raw data:目前原始数据中有 3 种不同的结果:
result 1: The duplicate sets (eg all HC0206 duplicates or HC0208 duplicates in column E) have the same value in column D (either all blank, all A or all B) - These are fine and don't cause a problem.结果 1:重复集(例如 E 列中的所有 HC0206 重复项或 HC0208 重复项)在 D 列中具有相同的值(全部为空白、全部为 A 或全部为 B)-这些都很好,不会引起问题。
result 2: The duplicate sets have both blank and A in column D - When duplicates are removed an A must remain in column D.结果 2:重复集在 D 列中既有空白又有 A - 删除重复项后,A 必须保留在 D 列中。
result 3: The duplicate sets have both blank and B in column D - When duplicates are removed a B must remain in column D.结果 3:重复集在 D 列中既有空白又有 B - 当删除重复项时,B 必须保留在 D 列中。
No duplicate sets have both A and B so we don't have to worry about that possibility.没有重复的集合有 A 和 B,所以我们不必担心这种可能性。
I just can't work out how to ensure that when the duplicates are removed from results 2 and 3 above, that the letter remains and not the blank.我只是不知道如何确保当从上面的结果 2 和 3 中删除重复项时,字母仍然存在而不是空白。 If I could work out a way to ensure that all duplicate sets have the same value in column D then I could just remove duplicates without issue.如果我能找到一种方法来确保所有重复集在 D 列中具有相同的值,那么我可以毫无问题地删除重复项。
Any help would be greatly appreciated.任何帮助将不胜感激。
Thanks谢谢
Talking about overthinking.. you could realize it by formula in Office 365:说到想太多..你可以通过Office 365中的公式来实现:
=LET(sorted,SUBSTITUTE(SORT(SORT(FILTER(D:E,E:E<>"","")),2),"",""),uniqueE,UNIQUE(INDEX(sorted,,2)),matchD,INDEX(INDEX(sorted,,1),MATCH(uniqueE,INDEX(sorted,,2),0)),CHOOSE({1,2},matchD,uniqueE))
sorted
-part makes sure column D:E are sorted by column 1, then 2 and blanks (that will result in 0
) are shown as actual blank. sorted
-part 确保列 D:E 按列 1 排序,然后 2 和空白(将导致0
)显示为实际空白。 The sorting for later use.排序以备后用。
uniqueE
-part results in the unique values in column E
uniqueE
-part 导致E
列中的唯一值
matchD
-part shows the match of the unique values uniqueE
in sorted
. matchD
-第一部分显示了独特的价值观的匹配uniqueE
的sorted
。 The first match in sorted
column 2 will return the indexed value of sorted
column 1.已sorted
列 2 中的第一个匹配项将返回已sorted
列 1 的索引值。
matchD followed by uniqueE is your spilled result matchD 后跟 uniqueE 是你的溢出结果
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.