简体   繁体   English

从 VBA 中的文本中剥离特殊字符的问题

[英]Issues stripping special characters from text in VBA

I have an Excel file that pulls in data from a csv, manipulates it a bit, and then saves it down as a series of text files.我有一个 Excel 文件,它从 csv 中提取数据,对其进行一些操作,然后将其保存为一系列文本文件。

There are some special characters in the source data that trip things up so I added this to strip them out源数据中有一些特殊字符会出错,所以我添加了这个来去除它们

Const SpecialCharacters As String = "!,@,#,$,%,^,&,*,(,),{,[,],},?,â,€,™"

Function ReplaceSpecialCharacters(myString As String) As String

Dim newString As String
Dim char As Variant

newString = myString

For Each char In Split(SpecialCharacters, ",")
    newString = Replace(newString, char, "")
Next

ReplaceSpecialCharacters = newString

End Function

The issue is that this doesn't catch all of them.问题是这并没有抓住所有的人。 When I try to process the following text it slips through the above code and causes Excel to error out.当我尝试处理以下文本时,它会跳过上面的代码并导致 Excel 出错。

Hero’s Village

I think the issue is that the special character isn't being recognized by Excel itself.我认为问题在于特殊字符未被 Excel 本身识别。 I was only able to get the text to look like it does above by copying it out of Excel and pasting it into a different IDE. In Excel is displays as:我只能通过将文本从 Excel 复制出来并将其粘贴到不同的 IDE 中来使文本看起来像上面那样。在 Excel 中显示为:

In the workbook在工作簿中工作簿截图

In the edit field在编辑区编辑栏截图

In the immediate window即时 window 即时窗口截图

Based on this site it looks like it's having issues displaying the ' character, but how do I get it to fix/filter it out if it can't even read it properly in VBA itself?基于此站点,它似乎在显示'字符时出现问题,但是如果它甚至无法在 VBA 本身中正确读取它,我该如何修复/过滤掉它?

Option Explicit
dim mystring as String
dim regex as new RegExp

Private Function rgclean(ByVal mystring As String) As String

'function that find and replace string if contains regex pattern
'returns str

    With regex

        .Global = True
        .Pattern = "[^ \w]" 'regex pattern will ignore spaces, word and number characters...

    End With

    rgclean = regex.Replace(mystring, "") '.. and replaces everything else with ""

End Function

Try using regular expression.尝试使用正则表达式。

Make sure you enable regular expression on: Tools > References > checkbox: "Microsoft VBScript Regular Expressions 5.5"确保启用正则表达式:工具 > 参考 > 复选框:“Microsoft VBScript 正则表达式 5.5”

Pass the "mystring" string variable into the function (rgclean).将“mystring”字符串变量传递到 function (rgclean)。 The function will check for anything that is not space, word[A-Za-z], or numbers[0-9], replace them with "", and returns the string. function 将检查任何非空格、单词 [A-Za-z] 或数字 [0-9] 的内容,将它们替换为“”,然后返回字符串。

The function will pretty much remove any symbols in the string. function 几乎会删除字符串中的所有符号。 Any Numbers, Space, or Word will NOT be excluded.不会排除任何数字、空格或单词。

Here is the opposite approach.这是相反的方法。 Remove ALL characters that are not included in this group of 62:删除不包含在这组 62 中的所有字符:

ABCDEFGHIJKLMNOPQESTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789

The code:代码:

Const ValidCharacters As String = "ABCDEFGHIJKLMNOPQESTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"

Function ReplaceSpecialCharacters(myString As String) As String
    
    Dim newString As String, L As Long, i As Long
    Dim char As Variant
    
    newString = myString
    L = Len(newString)
    
    For i = 1 To L
        char = Mid(newString, i, 1)
        If InStr(ValidCharacters, char) = 0 Then
            newString = Replace(newString, char, "@")
        End If
    Next i
    
    ReplaceSpecialCharacters = Replace(newString, "@", "")
    
End Function

在此处输入图像描述

Note:笔记:

You can also add characters to the string ValidCharacters if you want to retain them.如果要保留字符,还可以将字符添加到字符串ValidCharacters中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM