简体   繁体   中英

Parse long string based with different character VBA

I have broken my head. I need parse long string like that.

2003|Jaguar|S-Type|Base Sedan 4-Door|4.2L 4196CC V8 GAS DOHC Naturally Aspirated::Base To VIN # N52047 2003|Jaguar|S-Type|Base Sedan 4-Door|3.0L 183Cu. In. V6 GAS DOHC Naturally Aspirated::Base To VIN # N52047 2001|Jaguar|S-Type|Base Sedan 4-Door|4.0L 3996CC 244Cu. In. V8 GAS DOHC Naturally Aspirated::To VIN # N52047 2001|Jaguar|S-Type|Base Sedan 4-Door|3.0L 183Cu. In. V6 GAS DOHC Naturally Aspirated::To VIN # N52047 2002|Ford|Thunderbird 2002|Lincoln|LS 2002|Jaguar|S-Type|Base Sedan 4-Door|4.0L 3996CC 244Cu. In. V8 GAS DOHC Naturally Aspirated::To VIN # N52047 2000|Jaguar|S-Type|Base Sedan 4-Door|4.0L 3996CC 244Cu. In. V8 GAS DOHC Naturally Aspirated::To VIN # N52047 2002|Jaguar|S-Type|Base Sedan 4-Door|3.0L 183Cu. In. V6 GAS DOHC Naturally Aspirated::To VIN # N52047 2000|Jaguar|S-Type|Base Sedan 4-Door|3.0L 183Cu. In. V6 GAS DOHC Naturally Aspirated::To VIN # N52047 2000|Lincoln|LS 2003|Lincoln|LS 2001|Lincoln|LS 2003|Ford|Thunderbird 2004|Lincoln|LS 2004|Jaguar|S-Type|Base Sedan 4-Door|4.2L 4196CC V8 GAS DOHC Naturally Aspirated::Base To VIN # N52047 2004|Ford|Thunderbird 2005|Jaguar|S-Type|Sport Sedan 4-Door|3.0L 183Cu. In. V6 GAS DOHC Naturally Aspirated::Base / Sport To VIN # N52047 2005|Jaguar|S-Type|Base Sedan 4-Door|3.0L 183Cu. In. V6 GAS DOHC Naturally Aspirated::Base / Sport To VIN # N52047 2005|Lincoln|LS 2004|Jaguar|XJ8 2005|Jaguar|S-Type|Sport Sedan 4-Door|4.2L 4196CC V8 GAS DOHC Naturally Aspirated::Base / Sport To VIN # N52047 2006|Jaguar|S-Type|Base Sedan 4-Door|3.0L 183Cu. In. V6 GAS DOHC Naturally Aspirated::Base / VDP Edition To VIN # N52047 2006|Jaguar|S-Type|VDP Edition Sedan 4-Door|4.2L 4196CC V8 GAS DOHC Naturally Aspirated::Base / VDP Edition To VIN # N52047 2005|Jaguar|XJ8 2004|Jaguar|S-Type|Base Sedan 4-Door|3.0L 183Cu. In. V6 GAS DOHC Naturally Aspirated::Base To VIN # N52047 2006|Jaguar|S-Type|Base Sedan 4-Door|4.2L 4196CC V8 GAS DOHC Naturally Aspirated::Base / VDP Edition To VIN # N52047 2005|Ford|Thunderbird 2006|Lincoln|LS 2000|Jaguar|S-Type|Sport Sedan 4-Door|4.0L 3996CC 244Cu. In. V8 GAS DOHC Naturally Aspirated::To VIN # N52047 2002|Jaguar|S-Type|Sport Sedan 4-Door|4.0L 3996CC 244Cu. In. V8 GAS DOHC Naturally Aspirated::To VIN # N52047 2001|Jaguar|S-Type|Sport Sedan 4-Door|4.0L 3996CC 244Cu. In. V8 GAS DOHC Naturally Aspirated::To VIN # N52047 2002|Jaguar|S-Type|Base Sedan 4-Door|3.0L 2967CC 181Cu. In. V6 GAS DOHC Naturally Aspirated::To VIN # N52047 2005|Jaguar|S-Type|Sport Sedan 4-Door|3.0L 2967CC 181Cu. In. V6 GAS DOHC Naturally Aspirated::Base / Sport To VIN # N52047 2005|Jaguar|S-Type|Base Sedan 4-Door|4.2L 4196CC 256Cu. In. V8 GAS DOHC Naturally Aspirated::Base / Sport To VIN # N52047 2004|Jaguar|S-Type|Base Sedan 4-Door|3.0L 2967CC 181Cu. In. V6 GAS DOHC Naturally Aspirated::Base To VIN # N52047 2003|Jaguar|S-Type|Base Sedan 4-Door|4.2L 4196CC 256Cu. In. V8 GAS DOHC Naturally Aspirated::Base To VIN # N52047 2006|Jaguar|S-Type|Base Sedan 4-Door|3.0L 2967CC 181Cu. In. V6 GAS DOHC Naturally Aspirated::Base / VDP Edition To VIN # N52047 2004|Jaguar|S-Type|Base Sedan 4-Door|4.2L 4196CC 256Cu. In. V8 GAS DOHC Naturally Aspirated::Base To VIN # N52047 2005|Jaguar|S-Type|Sport Sedan 4-Door|4.2L 4196CC 256Cu. In. V8 GAS DOHC Naturally Aspirated::Base / Sport To VIN # N52047 2005|Jaguar|S-Type|Base Sedan 4-Door|3.0L 2967CC 181Cu. In. V6 GAS DOHC Naturally Aspirated::Base / Sport To VIN # N52047 2001|Jaguar|S-Type|Base Sedan 4-Door|3.0L 2967CC 181Cu. In. V6 GAS DOHC Naturally Aspirated::To VIN # N52047 2003|Jaguar|S-Type|Base Sedan 4-Door|3.0L 2967CC 181Cu. In. V6 GAS DOHC Naturally Aspirated::Base To VIN # N52047 2006|Jaguar|S-Type|Base Sedan 4-Door|4.2L 4196CC 256Cu. In. V8 GAS DOHC Naturally Aspirated::Base / VDP Edition To VIN # N52047 

Better structure

I know that my final table has 6 columns 3 - (year, make, model) is required 3 - (trim, engine, notes) is optional

Value engine is merged with Notes and has character "::" Others has character "|"

Final table

Here is part of my code - it works wrong. Any suggestion and improvement is welcomed and appreciated :)

Dim Ret
Dim Ret2
Dim strColumnA As String

strColumnA = wsTestComp.Range("A1")
Ret = Split(strColumnA, "|")
j = 1
k = 1
For i = LBound(Ret) To UBound(Ret)

    Debug.Print Ret(i)
    If IsNumeric(Ret(i)) Then
        wsTestComp.Range("A2").Offset(k, j).value = Ret(i)
        j = j + 1
    Else
        If IsNumeric(Right(Ret(i), 4)) Then
        Ret2 = Split(Ret(i), "::")
        For h = LBound(Ret2) To UBound(Ret2)
            If IsNumeric(Right(Ret(i), 4)) Then
            wsTestComp.Range("A2").Offset(k, j).value = Left(Ret2(h), Len(Ret2(h)) - 5)
            Else
            wsTestComp.Range("A2").Offset(k, j).value = Ret2(h)
            j = j + 1
            End If
        Next h

        k = k + 1
        Else
        wsTestComp.Range("A2").Offset(k, j).value = Ret(i)
        j = j + 1
        End If
        End If

Next i

Use a VBScript.RegExp to locate the years of the vehicles and replace the existing pattern with one that can be uniquely distinguished from the rest of the clutter to use a Split function on. The double-colons can be taken care of with a simple Replace function .

Sub makeCars()
    Dim tmp As String, y As Long, bUSE_REGEX As Boolean
    Dim pattern As String, replacement As String
    Dim rgx As Object, cmat As Object
    Dim v1 As Variant, v2 As Variant

    bUSE_REGEX = True

    With Worksheets("Sheet1")
        tmp = .Range("A1").Value2
        tmp = Replace(tmp, Chr(58) & Chr(58), Chr(124))
        tmp = Replace(tmp, Chr(124), Chr(167))
    End With

    If bUSE_REGEX Then
        'REGEX method
        Set rgx = CreateObject("VBScript.RegExp")
        With rgx
            .Global = True
            .pattern = "\s[0-9]{4}\§"
            Set cmat = .Execute(tmp)
            For y = 0 To cmat.Count - 1
                replacement = Replace(cmat(y), Chr(32), Chr(182))
                tmp = Replace(tmp, cmat(y), replacement)
            Next y
        End With
    Else
        'non-REGEX method
        For y = 1950 To 2025
            tmp = Replace(tmp, Chr(32) & y & Chr(167), Chr(182) & y & Chr(167))
        Next y
    End If

    With Worksheets("Sheet1")
        v1 = Split(tmp, Chr(182))
        For y = LBound(v1) To UBound(v1)
            v2 = Split(v1(y), Chr(167))
            .Cells(y + 2, 1).Resize(1, UBound(v2) + 1) = v2
        Next y
    End With

End Sub

I've offered up an alternative to the RegEx solution by simply cycling through 75 possible years worth of cars. While a little 'brute-force-like', it gets the job done and it would be hard to even measure the difference between the two methods in milli-seconds. This is viable in this situation because the possible years are reasonably limited; wider scopes of possibilities should be handled by RegEx.

regex_car_models

the key is recognize the year

here's a "bare" code

Option Explicit

Sub parsestring()

Dim Ret As Variant
Dim i As Long
Dim rng As Range

Set rng = ThisWorkbook.Worksheets("parse").Cells(1, 1) '<== cell with the string to parse

Ret = Split(Replace(Replace(rng.Value, "|", " |"), "::", " |"), " ")
For i = LBound(Ret) To UBound(Ret)
    If Ret(i) Like "####" Then Ret(i) = "§§" & Ret(i)
Next i
Ret = Split(Join(Ret), "§§")

With rng.Offset(2, 2) '<== the "database" will be placed two rows and columns away from the cell with the string to parse
    .Resize(UBound(Ret) + 1) = WorksheetFunction.Transpose(Ret)
    .Resize(UBound(Ret) + 1).TextToColumns Destination:=.Cells(1, 1), DataType:=xlDelimited, Other:=True, OtherChar:="|"
    .CurrentRegion.EntireColumn.AutoFit
End With

End Sub

and here with some little formatting and data sorting

Sub parsestring2()

Dim Ret As Variant
Dim i As Long
Dim rng As Range

Set rng = ThisWorkbook.Worksheets("parse").Cells(1, 1) '<== cell with the string to parse


Ret = Split(Replace(Replace(rng.Value, "|", " |"), "::", " |"), " ")
For i = LBound(Ret) To UBound(Ret)
    If Ret(i) Like "####" Then Ret(i) = "§§" & Ret(i)
Next i
Ret = Split(Join(Ret), "§§")

With rng.Offset(2, 2) '<== the "database" will be placed two rows and columns away from the cell with the string to parse
    .Resize(UBound(Ret) + 1) = WorksheetFunction.Transpose(Ret)
    .Resize(UBound(Ret) + 1).TextToColumns Destination:=.Cells(1, 1), DataType:=xlDelimited, Other:=True, OtherChar:="|"
    With .Resize(1, 6)
        .Value = Array("Year", "Make", "Model", "Trim", "Engine", "Notes")
        .Interior.ColorIndex = 16
        .Font.ColorIndex = 2
    End With
    .CurrentRegion.Sort key1:="Year", order1:=xlDescending, key2:="Make", order2:=xlAscending, key3:="Model", order3:=xlAscending, header:=xlYes
    .CurrentRegion.EntireColumn.AutoFit
End With

End Sub

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM