简体   繁体   English

VBA文字限定词清洗程序

[英]VBA text qualifier cleaning routine

I am writing a block of code which is going to read a line text from a file of comma delimited data and clean out any text qualifiers "" which are extra. 我正在编写一个代码块,该代码块将从逗号分隔的数据文件中读取行文本,并清除所有多余的文本限定符“”。 Every field needs 1 set. 每个字段需要1套。 trouble is the files wont have line breaks(cant be changed) so its going to be read in as 1 long string. 麻烦的是文件不会有换行符(不能更改),因此将其读取为1个长字符串。 everything is fine up until a line is supposed to switch over where the ,"","", runs up against ,"""", at the line end. 一切都准备好了,直到应该在一行切换到行尾的“,”,“”对着“”“”“的位置为止。 I know how many fields there are in each line so finding this field isn't a problem. 我知道每行中有多少个字段,因此找到该字段不是问题。 I have been wracking my brain for 4 hours trying to figure out how to differentiate the two sets of text qualifiers while also having the cleaning part of it clear out any extras if there are any. 我已经动了4个小时的大脑,试图弄清楚如何区分两组文本限定符,同时还要使它的清理部分清除任何多余的附加字符。 These are examples that would need to be cleaned 这些是需要清理的示例

,""10/20/18"""",        >>> ,"10/20/18""",  
,"10/20/18"""4380012"", >>> ,"10/20/18""4380012",  
,"10/20"/18""4380012",  >>> ,"10/20/18""4380012",
,""""4380012",          >>> ,"""4380012",

my first idea was to mark the positions of the two outer commas and save the distance between them. 我的第一个想法是标记两个外部逗号的位置并节省它们之间的距离。 I know at the very least there is supposed to be a pair of quotes between the fields so I thought moving an i and i+1 through the field might be a good way to figure out where its supposed to go but i couldn't determine the best configuration to do this. 我至少知道在字段之间应该有一对引号,所以我认为在字段中移动i和i + 1可能是弄清楚它应该去哪里的好方法,但是我无法确定最好的配置来做到这一点。
then i thought walking in from either side by doing i=i+1 and n=n-1 from the left and right sides respectively but i ran in to similar difficulties. 然后我想通过分别从左侧和右侧分别进行i = i + 1和n = n-1来从任一侧进入,但是我遇到了类似的困难。

the biggest problem was when there were extra qualifiers on the inner or outer edges and the other field was blank. 最大的问题是当内边缘或外边缘有额外的预选赛而另一个字段为空白时。

'My String
strLn as String
'Total number of quotes, good path = 4, bad path > 4
QuoteTotal as Integer
'the First quotes position
FirstQ as Integer
'The Ending quote position
RightQ as Integer
'Counter equals the position of the leading comma
Counter as Integer
'holds the quote character for use throughout code
Q as String
Q = Chr(34)
'Holds the comma character for use throughout the code
C as String
C = Chr(44)
'Holds the position of the last quote and the number of quotes between the fields
Dim LFieldQ As Integer
Dim LFieldQTotal As Integer
LFieldQ = 0
LFieldQTotal = 0
'Tracks position and length of the first data we find moving left to right
Dim LData as Integer
Dim LDatalen as Integer

    For i = Counter To Len(strLn)
        If (Mid(strLn, i, 1) = Q) Then
            LFieldQTotal = LFieldQTotal + 1
        End If
        If (Mid(strLn, i + 1, 1) = Q And Mid(strLn, i + 2, 1) = C) Then
            LFieldQTotal = LFieldQTotal + 1
            LFieldQ = i + 1
            Exit For
        End If
    Next

    If (LFieldQTotal <> 4) Then
        For i = FirstQ + 1 To LFieldQ
            If (Mid(strLn, i, 1) = Q) Then
                strLn = Mid(strLn, 1, i - 1) & Replace(strLn, Q, "", i, 1)
            ElseIf (Mid(strLn, i, 1) <> Q) Then
                LData = i
                For n = i To LFieldQ
                    If (Mid(strLn, n, 1) = Q) Then
                        LDatalen = n - 1
                        i = n
                        Exit For
                    End If
                Next
            End If
        Next
    End If

(i'm aware this is incomplete) (我知道这是不完整的)
this block was my current attempt for walking the i and i+1 through the field. 这个障碍是我目前尝试让i和i + 1穿过田野的尝试。 i'm getting caught up on having happy path exits but i shouldn't because the outer if statement has already determined we aren't going to find a happy path. 我已经陷入了幸福道路退出的困境,但是我不应该因为外部if语句已经确定我们不会找到一条幸福道路。
i want the cleaning aspect to be able to clean out any number of and any combination of extra qualifiers which may also be an unnecessary criteria i'm placing on it because in all honesty there will only ever potentially be one complete extra set placed in a field but i want it to work for all eventualities. 我希望清洁方面能够清除任何数量的多余限定符以及它们的任何组合,这对我来说也可能是不必要的标准,因为老实说,在一个领域,但我希望它为所有可能的情况工作。 any thoughts would be appreciated. 任何想法将不胜感激。 not looking for a solved block of code but any push in the right direction would be great. 而不是寻找已解决的代码块,但是朝正确方向的任何推动都是不错的。

a piece of information I didn't know at the beginning was there will always be a value in the right field so I would only have to deal with examples such as 一开始我不知道的信息是,正确的字段中总会有一个值,所以我只需要处理诸如

,"""""4380012",         >>> ,"""4380012",
,""""4380012"",         >>> ,"""4380012",
,""10/20/18"""4380012", >>> ,"10/20/18""4380012",

this made the ambiguous nature of where the fields lay in a bunch of quotes much easier to manage. 这使得字段在一堆引号中的位置变得模棱两可,因此更易于管理。 below is my final product which appears to work in all the cases I need it to. 以下是我的最终产品,该产品似乎可以在我需要的所有情况下正常工作。

Private Sub TestSub()
Dim strLn As String
Dim i As Integer
Dim n As Integer
Dim s As Integer
Dim Q As String
Q = Chr(34)

strLn = "," & Q & Q & Q & "ABC" & Q & Q & Q & Q & Q & "DEF" & Q & ","
LfieldQ = Len(strLn) - 1
MsgBox (strLn)
n = LfieldQ

Do While n > FirstQ
    If (Mid(strLn, n, 1) = Q And Mid(strLn, n - 1, 1) = Q And (n = LfieldQ)) Then
        strLn = Mid(strLn, 1, n - 2) & Replace(strLn, Q, "", n - 1, 1)
        LfieldQ = LfieldQ - 1
        n = n + 1
    ElseIf (Mid(strLn, n, 1) = Q And Mid(strLn, n - 1, 1) <> Q And (n = LfieldQ)) Then
        For i = n - 1 To FirstQ Step -1
            If (Mid(strLn, i, 1) = Q) Then
                n = i
                Exit For
            End If
        Next i
        If (Mid(strLn, n, 1) = Q And Mid(strLn, n - 1, 1) = Q) Then
            n = n - 1
            For i = n - 1 To FirstQ Step -1
                If (Mid(strLn, i, 1) = Q And i <> FirstQ) Then
                    strLn = Mid(strLn, 1, i - 1) & Replace(strLn, Q, "", i, 1)
                ElseIf (Mid(strLn, i, 1) <> Q) Then
                    For s = i To FirstQ Step -1
                        If (Mid(strLn, s, 1) = Q And s <> FirstQ) Then
                            strLn = Mid(strLn, 1, s - 1) & Replace(strLn, Q, "", s, 1)
                            Exit For
                        End If
                    Next
                End If
            Next
        End If



    End If
    n = n - 1
Loop
MsgBox (strLn)
End Sub

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM