I'm putting together a VBA script that compares two columns of data (about 15,000 rows each) and determines if any cell is a permutation/ of another cell.
For example, if A1 = 15091 and B52 = 19510 then the function would identify them as having the same set of characters.
I have a loop set up that checks each individual cell in column A against every other cell in column B and various functions within the loop, but have been unsuccessful thus far in anything that will accomplish this task.
In addition, the problem gets compounded by the fact that "number" formatted cells will drop all zeroes after the decimal so that 15091.1 will not be identified as the same set of characters as 15091.01.
You can do this without VBA, using a pure Excel approach. (Though find the VBA solution below) The idea is to build a kind of "hash-value" for each value that is the same for each permutation of a set of digits - without overlapping with other hashes.
One would to do so is to:
Then, all you need to do is to match these hashes against each other (using Excel's MATCH
function) and see if something is found (using the ISERROR
function).
Step by step instruction for Excel (assuming that your data is in Sheet1 and Sheet2, column A, starting in A1:
=TEXT(A3,"0")
- this will get rid of the remainder each number and convert it to a text. Copy the formula down till the end of your range =10^C1
=LEN($B3)-LEN(SUBSTITUTE($B3,C$1,""))
- and copy it to the right till column L and down till the end of your list. This will count the number of digits =SUMPRODUCT(C3:L3,$C$2:$L$2)
- this will calculate the hash =NOT(ISERROR(MATCH(M3,Sheet2!$M:$M,0)))
Done!
Here's a VBA solution:
Option Explicit
Sub IdentifyMatches()
Dim rngKeys As Range, rngToMatch As Range, rngCell As Range
Dim dicHashes As Object
'the range you want to have highlighted in case of a match
Set rngKeys = Sheets("Sheet1").Range("A3:A5")
'the range to search for matches
Set rngToMatch = Sheets("Sheet2").Range("A3:A5")
Set dicHashes = CreateObject("Scripting.Dictionary")
'Create dictionary of hashes (dictionary is used for its .Exists property
For Each rngCell In rngToMatch
dicHashes(GetHash(rngCell)) = True
Next
'Check each cell in rngKey if it has a match
For Each rngCell In rngKeys
If dicHashes.Exists(GetHash(rngCell)) Then
'Action to take in case of a match
rngCell.Font.Bold = True
Debug.Print rngCell.Value & " has a match!"
Else
rngCell.Font.Bold = False
End If
Next
End Sub
Function GetHash(rngValue As Range) As Long
Dim strValue As String
Dim i As Integer, digit As Integer
Dim result As Long
Dim digits(0 To 9) As Integer
'Potentially add error check here
strValue = Format(rngValue.Value, "0")
For i = 1 To Len(strValue)
digit = Int(Mid(strValue, i, 1))
digits(digit) = digits(digit) + 1
Next i
For i = 0 To 9
result = result + 10 ^ i * digits(i)
Next i
GetHash = result
End Function
Last but not least, here's the example file .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.