简体   繁体   中英

How to use VBA and RegEx in Excel to replace data?

I have a big .csv file (~600k lines, 56Mo), and inside there is database lines (on each line, there's an id, a client name, a client address, a client birthday date, etc). The problem is that, on some lines, some data is written badly (commas not supposed to be there, that mess up the columns).

I guessed that I had to do some RegEx to detect the problematic lines, and to replace the wrong commas with a dash or anything. I followed this article , and, after some tries, I got him to detect the messed-up lines.

Private Sub simpleRegex()
Dim strPattern As String: strPattern = "[^a-zA-Z0-9_,\-]([A-z]+)\,[^a-zA-Z0-9_,\-]([A-z]+)"

Dim strReplace As String: strReplace = "[^a-zA-Z0-9_,\-][A-z]+\-[^a-zA-Z0-9_,\-][A-z]"

Dim regEx As Object
Set regEx = CreateObject("VBScript.RegExp")
Dim strInput As String
Dim Myrange As Range

Set Myrange = ActiveSheet.Range("A1:A2000")

For Each cell In Myrange
    If strPattern <> "" Then
        strInput = cell.Value

        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern
        End With

        If regEx.Test(strInput) Then
            MsgBox (regEx.Replace(strInput, strReplace))
        Else

        End If
    End If
Next End Sub

The problem is, this solution works if I wanted to change the cibled lines with an unique value, a predefined string (like strReplace="replacement words"). What I want, is to be able to target a succession of characters that match my pattern, and to replace only one character (the comma) in it. An example of it would be :

728,"HAY,HAYE",Marie,François,RAUTUREAU,85,29/05/1856,68;

into :

728,"HAY-HAYE",Marie,François,RAUTUREAU,85,29/05/1856,68;

Do you have a solution?

(Sorry if bad english, it's not my mother tongue).

You can use (?<=(Your Word)) to catch specific characters after a specific word. In your case, this code will help you find the comma:

(?<=(HAY))\,

Update:

Try this and I also updated the demo:

,(?=[^"]+")

Demo: https://regex101.com/r/0rtcFt/6

The correct approach here (since you commented that double quotes only appear as field delimiters) is to match double quoted substrings with a simple "[^"]+" regex and replace commas with hyphens only inside the matches .

Here is a sample code:

Sub CallbackTest()
Dim rxStr As RegExp
Dim s As String
Dim m As Object

s = """SOME,MORE,HERE"",728,""HAY,HAYE"",Marie,François,RAUTUREAU,85,29/05/1856,68;"

Set rxStr = New RegExp
rxStr.pattern = """[^""]+"""
rxStr.Global = True

For Each m In rxStr.Execute(s)
   s = Left(s, m.FirstIndex) & Replace(m.Value, ",", "-") & Mid(s, m.FirstIndex + Len(m.Value) + 1)
Next m
Debug.Print s              ' Print demo results
' => "SOME-MORE-HERE",728,"HAY-HAYE",Marie,François,RAUTUREAU,85,29/05/1856,68;

End Sub

If I got you correct, then there is no need for Regex at all.

With a simple Split you can do it too.

Private Sub simpleReplace()
  Dim str() As String, cell As Variant, Myrange As Range, i As Long
  Set Myrange = ActiveSheet.Range("A1:A2000")
  For Each cell In Myrange
    str = Split(cell.Value, """")
    If UBound(str) Then
      For i = 1 To UBound(str) Step 2
        str(i) = Replace(str(i), ",", "-")
      Next
      cell.Value = Join(str, """")
    End If
  Next
End Sub

this will split your 728,"HAY,HAYE",Marie,François,RAUTUREAU,85,29/05/1856,68; into:

(0) 728,
(1) HAY,HAYE
(2) ,Marie,François,RAUTUREAU,85,29/05/1856,68;

Now every second part of the Split (odd numbers) will be normally enclosed in " . So all that is left, is to Replace the commas there and push it into the cell again.

And if there is no " found, then it will skip this line (because the upper bound is 0)

If you still have any Questions or if that is not what you are looking for, pls tell :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM