简体   繁体   中英

Issues stripping special characters from text in VBA

I have an Excel file that pulls in data from a csv, manipulates it a bit, and then saves it down as a series of text files.

There are some special characters in the source data that trip things up so I added this to strip them out

Const SpecialCharacters As String = "!,@,#,$,%,^,&,*,(,),{,[,],},?,â,€,™"

Function ReplaceSpecialCharacters(myString As String) As String

Dim newString As String
Dim char As Variant

newString = myString

For Each char In Split(SpecialCharacters, ",")
    newString = Replace(newString, char, "")
Next

ReplaceSpecialCharacters = newString

End Function

The issue is that this doesn't catch all of them. When I try to process the following text it slips through the above code and causes Excel to error out.

Hero’s Village

I think the issue is that the special character isn't being recognized by Excel itself. I was only able to get the text to look like it does above by copying it out of Excel and pasting it into a different IDE. In Excel is displays as:

In the workbook工作簿截图

In the edit field编辑栏截图

In the immediate window 即时窗口截图

Based on this site it looks like it's having issues displaying the ' character, but how do I get it to fix/filter it out if it can't even read it properly in VBA itself?

Option Explicit
dim mystring as String
dim regex as new RegExp

Private Function rgclean(ByVal mystring As String) As String

'function that find and replace string if contains regex pattern
'returns str

    With regex

        .Global = True
        .Pattern = "[^ \w]" 'regex pattern will ignore spaces, word and number characters...

    End With

    rgclean = regex.Replace(mystring, "") '.. and replaces everything else with ""

End Function

Try using regular expression.

Make sure you enable regular expression on: Tools > References > checkbox: "Microsoft VBScript Regular Expressions 5.5"

Pass the "mystring" string variable into the function (rgclean). The function will check for anything that is not space, word[A-Za-z], or numbers[0-9], replace them with "", and returns the string.

The function will pretty much remove any symbols in the string. Any Numbers, Space, or Word will NOT be excluded.

Here is the opposite approach. Remove ALL characters that are not included in this group of 62:

ABCDEFGHIJKLMNOPQESTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789

The code:

Const ValidCharacters As String = "ABCDEFGHIJKLMNOPQESTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"

Function ReplaceSpecialCharacters(myString As String) As String
    
    Dim newString As String, L As Long, i As Long
    Dim char As Variant
    
    newString = myString
    L = Len(newString)
    
    For i = 1 To L
        char = Mid(newString, i, 1)
        If InStr(ValidCharacters, char) = 0 Then
            newString = Replace(newString, char, "@")
        End If
    Next i
    
    ReplaceSpecialCharacters = Replace(newString, "@", "")
    
End Function

在此处输入图像描述

Note:

You can also add characters to the string ValidCharacters if you want to retain them.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM