简体   繁体   中英

Clean data having multiple delimiters using vba

I have a list of manually entered ticket numbers with different delimiters and some strings. I am trying to clean it to just have the ticket numbers starting with "AK1" in a single string separated with commas.

在此处输入图片说明

数据

Multi Split

Dirty Version

In this version all the strings are split and written using the principle one cell to one (another) cell.

Sub MultiSplit()

    Const cDel As String = ";,/"  ' Delimiter List
    Const cCol1 As Variant = "A"  ' Source Column Letter/Number
    Const cCol2 As Variant = "B"  ' Target Column Letter/Number
    Const cDelR As String = ","   ' Replace Delimiter
    Const cFirstR As Long = 1     ' First Row Number

    Dim vntR As Variant   ' Range Array
    Dim vntD As Variant   ' Delimiter Array

    Dim LastR As Long     ' Last Row Number
    Dim i As Long         ' Range Array Row Counter
    Dim j As Long         ' Delimiter Array Row Counter

    ' Calculate Last Row Number.
    LastR = Cells(Rows.Count, cCol1).End(xlUp).Row

    ' Copy Source Range into Range Array.
    vntR = Range(Cells(cFirstR, cCol1), Cells(LastR, cCol1))

    ' Split Delimiter List into Delimiter Array
    vntD = Split(cDel, ",")

    ' Calculate values in Range Array.
    For i = 1 To UBound(vntR) ' Range Array
        For j = 0 To UBound(vntD) ' Delimiter Array
            ' Replace by overwriting.
            vntR(i, 1) = Replace(vntR(i, 1), vntD(j), cDelR)
        Next
    Next

    ' Copy Range Array to Target Range.
    Range(Cells(cFirstR, cCol2), Cells(LastR, cCol2)) = vntR

End Sub

Clean One String Version

If you want all the AK1 tickets in a single cell then use the following code. Adjust cDelC (the final delimiter) to fit your needs ( eg aa,aa or aa, aa).

Sub MultiSplit2()

    Const cDel As String = ";,/"     ' Delimiter List
    Const cCol1 As Variant = "A"     ' Source Column Letter/Number
    Const cCol2 As Variant = "B"     ' Target Column Letter/Number
    Const cDelR As String = ","      ' Replace Delimiter
    Const cFirstR As Long = 1        ' First Row Number
    Const cDelC As String = ", "     ' Clean Delimiter
    Const cString As String = "AK1"  ' Desired Start String

    Dim vntR As Variant   ' Range Array
    Dim vntD As Variant   ' Delimiter Array
    Dim vntT As Variant   ' Temporary Array

    Dim LastR As Long     ' Last Row Number
    Dim i As Long         ' Range Array Row Counter
    Dim j As Long         ' Delimiter Array Row Counter
    Dim strT As String    ' Target String


    ' Calculate Last Row Number.
    LastR = Cells(Rows.Count, cCol1).End(xlUp).Row

    ' Copy Source Range into Range Array.
    vntR = Range(Cells(cFirstR, cCol1), Cells(LastR, cCol1))

    ' Split Delimiter List into Delimiter Array
    vntD = Split(cDel, ",")

    ' Calculate values in Range Array.
    For i = 1 To UBound(vntR) ' Range Array
        For j = 0 To UBound(vntD) ' Delimiter Array
            ' Replace by overwriting.
            vntR(i, 1) = Replace(vntR(i, 1), vntD(j), cDelR)
        Next
        Debug.Print vntR(i, 1)
    Next

    ' Clean the strings in Range Array.
    For i = 1 To UBound(vntR)
        vntT = Split(vntR(i, 1), cDelR)
        For j = 0 To UBound(vntT)
            If Left(Trim(vntT(j)), Len(cString)) = cString Then
                If strT <> "" Then
                    strT = strT & cDelC & Trim(vntT(j))
                  Else
                    strT = Trim(vntT(j))
                End If
            End If
        Next
    Next

    ' Copy Target String to Target Cell.
    Cells(cFirstR, cCol2) = strT

End Sub

I suggest to do this with a UDF (User Defined Function. Install the code below in a standard code module (Press Alt+F11 to open the VB Editor window. Right-click on the VBA Project in the Project explorer window on the left, Select Insert > Module, and paste the code in the empty code panel on the right). Remember to save the workbook in xlsm (macro enabled) format.

Function ExtractAK1(Cell As Range) As String

    Const AK1 As String = "AK1-"

    Dim Var As Variant
    Dim Sp() As String
    Dim i As Integer

    Var = Cell.Value
    If VarType(Var) = vbString Then
        If InStr(1, Var, AK1, vbTextCompare) Then
            Sp = Split(Trim(Var), AK1)
            For i = 1 To UBound(Sp)
                Sp(i) = AK1 & Left(Trim(Sp(i)), 5)
            Next i
            Var = Join(Sp, ",")
            ExtractAK1 = Mid(Var, InStr(Var, ",") + 1)
        End If
    End If
End Function

Call the function in the worksheet like you would call a built-in Excel function, for example,

=ExtractAK1($A2)

If properly installed, Excel will suggest the function's name when you start typing it. $A2 is the cell containing your text. Copy the formula down for as long as needed. You can re-purpose the function for use in a loop if that is the more convenient way of using it.

The following UDF will extract whatever you input into a comma delimited list of AK ticket numbers only. It is assumed that the ticket number pattern is AK- followed by digits only, which is what you show. And only the ticket numbers are extracted, also what you say you want.

  • If you input a single string, or a single cell, those contents will come out.
  • If you input a range of cells, they will be combined into a single output string.
  • VBA's Regular Expression engine is used to extract the ticket numbers

Option Explicit
  Public RE As Object
  Public MC As Object
  Public M As Object

    'Assume starts with AK- and ends with numbers
    '  as per your example
 Public Const sPat As String = "\bAK1-\d+"

Function getAK(vIN As Variant) As String
    Dim V As Variant
    Dim sTemp As String

Set RE = CreateObject("vbscript.regexp")
With RE
    .Pattern = sPat
    .ignorecase = False
    .Global = True
End With

If IsArray(vIN) Then
    For Each V In vIN
        sTemp = sTemp & "," & getStrOnly(CStr(V))
    Next V
Else
    getAK = getStrOnly(CStr(vIN))
    Exit Function
End If

getAK = Mid(sTemp, 2)

End Function

Private Function getStrOnly(str As String) As String
    Dim sTemp As String
    With RE
        If .test(str) = True Then
            Set MC = .Execute(str)
            For Each M In MC
                sTemp = sTemp & "," & M
            Next M
        End If
    End With
    getStrOnly = Mid(sTemp, 2)
End Function

Using the single cell version: getAK(A1) :

在此处输入图片说明

Using the multiple cell method:

=getAK(A1:A12)

we get

AK1-97760,AK1-96767,AK1-97719,AK1-97999,AK1-98105,,AK1-97113,AK1-97073,AK1-97019,AK1-97951,AK1-97858,AK1-97195,AK1-96806,AK1-97719,AK1-97896,AK1-98115,AK1-98151,AK1-98089,AK1-96780,AK1-90919,AK1-96705,AK1-96806,AK1-95397

If you also want to return the status of the ticket (the part in parentheses after the ticket number), you can change the regex to:

"\bAK1-\d+(?:\s*\([^)]+\))?"

And if your ticket patterns are different, you can also alter the regex accordingly.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM