简体   繁体   中英

Excel VBA to compare partial rows and delete duplicates

I need to compare partial data in 2 rows, and clear the data from one row. I cannot delete any rows entirely. Due to the size of my file, formulas are not preferred (though I have spent hours filtering and deleting in past years).

I have a massive file, 100,000+ rows. The first 18 columns include identifiers and cannot be deleted. The next 30-ish columns (variable) contain sporadic data. Most of the table is blank.

Due to data entry issues, some - but not all - of the data is a duplicate of the previous row. Using specific columns in the first 18 fields, I can identify which rows may be similar.

I need code to say: If these 3 (non-sequesntial) columns in row A and row B are the same, then compare the full range of data in Col 19 to Last Col. Compare A to B and delete the 2nd row. Move to the next pair of rows.

I have working code to delete one cell at a time. But I can't actually trust the single-cell comparisons - I really need to see if the entire data set for that record is a duplicate of the previous record. Given this issue, plus the number of blank cells in the data, I believe I need to either create a concatenation of the row data for comparison or use an array. I cannot find code like this that makes sense for my data set.

Sub DeleteCopyData()

    Dim ws As Worksheet
    Set ws = Application.ActiveSheet

    Dim c As Range

    Dim lRow As Long, lCol As Long
    Dim cStart As Range
    Set cStart = Range("A1")

    'Find last row & column.
    lRow = ws.Cells(ws.Rows.Count, cStart.Column).End(xlUp).Row
    lCol = ws.Cells(cStart.Row, ws.Columns.Count).End(xlToLeft).Column

    With ws
        For i = lRow To 2 Step -1
            'If identifiers in Col 11, 3, and 6 are the same, and the data in Col 24 is the same, clear the duplicate row data.
            If .Cells(i, 11) = .Cells((i - 1), 11) And _
                .Cells(i, 3) = .Cells((i - 1), 3) And _
                .Cells(i, 6) = .Cells((i - 1), 6) And _
                .Cells(i, 24) = .Cells((i - 1), 24) Then     'This needs to be a range or an array of some kind.
                .Cells(i, 24).Clear
            'The 2 lines above this work for one cell, but I need it to compare all data from Col 19 to lCol.
            'If data is the same, clear the duplicate data in row i from Col 19 to lCol.
             End If
        Next i
    End With
End Sub

Untested but this should be close:

Edited - had Exit Sub instead of Exit For

Sub DeleteCopyData()

    Dim ws As Worksheet
    Dim c As Range
    Dim lRow As Long, lCol As Long
    Dim cStart As Range
    Dim arr, i As Long, n As Long, rowmatch As Boolean

    Set ws = Application.ActiveSheet
    Set cStart = ws.Range("A1")

    arr = Array(3, 6, 11) 'first set of columns to test for match

    'Find last row & column.
    lRow = ws.Cells(ws.Rows.Count, cStart.Column).End(xlUp).Row
    lCol = ws.Cells(cStart.Row, ws.Columns.Count).End(xlToLeft).Column

    With ws
        For i = lRow To 2 Step -1

            rowmatch = True

            'perform the initial match on 3 cols...
            For n = LBound(arr) To UBound(arr)
                If .Cells(i, arr(n)) <> .Cells((i - 1), arr(n)) Then
                    rowmatch = False
                    Exit For
                End If
            Next n

            'got through the first tests - look at the cells starting in col 19
            If rowmatch Then
                For n = 19 To lCol
                    If .Cells(i, n) <> .Cells((i - 1), n) Then
                        rowmatch = False
                        Exit For
                    End If
                Next n
            End If

            'no mismatches, so clear from col 19 to end of row
            If rowmatch Then .Range(.Cells(i, 19), .Cells(i, lCol)).ClearContents
        Next i
    End With
End Sub
Sub DeleteCopyData()

    Dim ws As Worksheet, lRow As Long, lCol As Long, cStart As Range, C As Range

    Set ws = Application.ActiveSheet
    Set cStart = ws.Range("A1")

    'Find last row & column.
    lRow = ws.Cells(ws.Rows.Count, cStart.Column).End(xlUp).Row
    lCol = ws.Cells(cStart.Row, ws.Columns.Count).End(xlToLeft).Column

    With ws
        For i = lRow To 2 Step -1
            'If identifiers in Col 11, 3, and 6 are the same, and the data in Col 19:28 is the same, clear the duplicate row data.
            If .Cells(i, 11) = .Cells((i-1), 11) And _
                .Cells(i, 3) = .Cells((i - 1), 3) And _
                .Cells(i, 6) = .Cells((i - 1), 6) And _
                Join(Application.Transpose(.Range(.Cells(i, 19), .Cells(i, lCol))), Chr(0)) = _
                Join(Application.Transpose(.Range(.Cells(i-1, 19), .Cells(i-1, lCol))), Chr(0)) Then
                .Range(.Cells(i, 19), .Cells(i, 28)).Clear
             End If
        Next i
    End With
End Sub

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM