簡體   English   中英

如何拆分搜索字符串以允許帶引號的文本?

[英]How do I split up a search string to allow for quoted text?

我想從搜索字段的文本中列出字符串。 我想將雙引號中的所有內容都分開。

恩。
sample' "string's are, more "text" making" 12.34,hello"pineapple sundays

產生

sample' 
string's are, more_  //underscore shown to display space
text
 making
12.34
hello
pineapple
sundays

編輯:這是我的(有點)優雅的解決方案,感謝大家的幫助!

Private Function GetSearchTerms(ByVal searchText As String) As String()
    'Clean search string of unwanted characters'
    searchText = System.Text.RegularExpressions.Regex.Replace(searchText, "[^a-zA-Z0-9""'.,= ]", "")

    'Guarantees the first entry will not be an entry in quotes if the searchkeywords starts with double quotes'
    Dim searches As String() = searchText.Replace("""", " "" ").Split("""")
    Dim myWords As System.Collections.Generic.List(Of String) = New System.Collections.Generic.List(Of String)
    Dim delimiters As String() = New String() {" ", ","}

    For index As Integer = 0 To searches.Length - 1
        'even is regular text, split up into individual search terms'
        If (index Mod 2 = 0) Then
            myWords.AddRange(searches(index).Split(delimiters, StringSplitOptions.RemoveEmptyEntries))
        Else
            'check for unclosed double quote, if so, split it up and add, space we added earlier will get split out'
            If (searches.Length Mod 2 = 0 And index = searches.Length - 1) Then
                myWords.AddRange(searches(index).Split(delimiters, StringSplitOptions.RemoveEmptyEntries))
            Else
                '2 double quotes found'
                'remove the 2 spaces that we added earlier'
                Dim myQuotedString As String = searches(index).Substring(1, searches(index).Length - 2)
                If (myQuotedString.Length > 0) Then
                    myWords.Add(myQuotedString)
                End If
            End If
        End If
    Next
    Return myWords.ToArray()
End Function

Oi,VB評論很丑,有人知道如何清理嗎?

這是一個比您完全理解的更為復雜的解析問題。 建議您查看TextFieldParser類和FileHelpers庫: http : //www.filehelpers.com/

這不是完整的解決方案,因為它缺少一些驗證檢查,但它具有您需要的一切。

我的CharOccurs()查找出現的'"'並將它們按順序存儲到列表中。

public static List<int> CharOccurs(string stringToSearch, char charToFind)
        {
            List<int> count = new List<int>();
            int  chr = 0;
            while (chr != -1)
            {
                chr = stringToSearch.IndexOf(charToFind, chr);
                if (chr != -1)
                {
                    count.Add(chr);
                    chr++;
                }
                else
                {
                    chr = -1;
                }
            }
            return count;
        }

下面的代碼在很大程度上是說明性的。 我將引號內的字符串作為分隔符,並僅用'"' character進行拆分,然后對外部引號字符串進行SubString,然后將其拆分為",", space and '"'字符。 請添加驗證檢查,以使其通用。

string input = "sample' \"string's are, more \"text\" making\" 12.34,hello\"pineapple sundays";

            List<int> positions = CharOccurs(input, '\"');

            string within_quotes, outside_quotes;
            string[] arr_within_quotes;
            List<string> output = new List<string>();

            output.AddRange(input.Substring(0, positions[0]-1).Split(new char[] { ' ', ',', '"' }));

            if (positions.Count % 2 == 0)
            {
                within_quotes = input.Substring(positions[0]+1, positions[positions.Count - 1] - positions[0]-1);
                arr_within_quotes = within_quotes.Split('"');
                output.AddRange(arr_within_quotes);
                output.AddRange(input.Substring(positions[positions.Count - 1] + 1).Split(new char[] { ' ', ',' }));
            }
            else
            {
                within_quotes = input.Substring(positions[0]+1, positions[positions.Count - 2] - positions[0]-1);
                arr_within_quotes = within_quotes.Split('"');
                output.AddRange(arr_within_quotes);
                output.AddRange(input.Substring(positions[positions.Count - 2] + 1).Split(new char[] { ' ', ',', '"' }));
            }

我幾個月前為VB.NET編寫了這個Parse Line函數,它可能對您有用,它可以確定是否有Text Qualifiers並將基於Text拆分,請嘗試在其中將其轉換為C#。接下來的幾分鍾,如果您希望我這樣做。

您將有一行文本:

樣例的“字符串是,更多的是“文本”制作” 12.34,你好,“菠蘿星期日

然后將其作為strLine,並將strDataDelimeters =“,”設置為strTextQualifier =“”“”

希望這可以幫助你。

Public Function ParseLine(ByVal strLine As String, Optional ByVal strDataDelimiter As String = "", Optional ByVal strTextQualifier As String = "", Optional ByVal strQualifierSplitter As Char = vbTab) As String()
        Try
            Dim strField As String = Nothing
            Dim strNewLine As String = Nothing
            Dim lngChrPos As Integer = 0
            Dim bUseQualifier As Boolean = False
            Dim bRemobedLastDel As Boolean = False
            Dim bEmptyLast As Boolean = False   ' Take into account where the line ends in a field delimiter, the ParseLine function should keep that empty field as well.


            Dim strList As String()

            'TEST,23479234,Just Right 950g,02/04/2006,1234,5678,9999,0000
            'TEST,23479234,Just Right 950g,02/04/2006,1234,5678,9999,0000,
            'TEST,23479234,Just Right 950g,02/04/2006,1234,,,0000,
            'TEST,23479234,Just Right 950g,02/04/2006,1234,5678,9999,,
            'TEST,23479234,"Just Right 950g, BO",02/04/2006,,5678,9999,,
            'TEST,23479234,"Just Right"" 950g, BO",02/04/2006,,5678,9999,1111,
            'TEST23479234 'Kellogg''s Just Right 950g' 02/04/2006 1234 5678 0000 9999
            'TEST23479234 '' 02/04/2006 1234 5678 0000 9999

            bUseQualifier = strTextQualifier.Length()

            'split data based on options..
            If bUseQualifier Then
                'replace double qualifiers for ease of parsing..
                'strLine = strLine.Replace(New String(strTextQualifier, 2), vbTab)

                'loop and find each field..
                Do Until strLine = Nothing

                    If strLine.Substring(0, 1) = strTextQualifier Then

                        'find closing qualifier
                        lngChrPos = strLine.IndexOf(strTextQualifier, 1)

                        'check for missing double qualifiers, unclosed qualifiers
                        Do Until (strLine.Length() - 1) = lngChrPos OrElse lngChrPos = -1 OrElse _
                          strLine.Substring(lngChrPos + 1, 1) = strDataDelimiter

                            lngChrPos = strLine.IndexOf(strTextQualifier, lngChrPos + 1)
                        Loop

                        'get field from line..
                        If lngChrPos = -1 Then
                            strField = strLine.Substring(1)
                            strLine = vbNullString
                        Else
                            strField = strLine.Substring(1, lngChrPos - 1)
                            If (strLine.Length() - 1) = lngChrPos Then
                                strLine = vbNullString
                            Else
                                strLine = strLine.Substring(lngChrPos + 2)
                                If strLine = "" Then
                                    bEmptyLast = True
                                End If
                            End If

                            'strField = String.Format("{0}{1}{2}", strTextQualifier, strField, strTextQualifier)
                        End If

                    Else
                        'find next delimiter..
                        'lngChrPos = InStr(1, strLine, strDataDelimiter)
                        lngChrPos = strLine.IndexOf(strDataDelimiter)

                        'get field from line..
                        If lngChrPos = -1 Then
                            strField = strLine
                            strLine = vbNullString
                        Else
                            strField = strLine.Substring(0, lngChrPos)
                            strLine = strLine.Substring(lngChrPos + 1)
                            If strLine = "" Then
                                bEmptyLast = True
                            End If
                        End If
                    End If

                    ' Now replace double qualifiers with a single qualifier in the "corrected" string
                    strField = strField.Replace(New String(strTextQualifier, 2), strTextQualifier)

                    'restore double qualifiers..
                    'strField = IIf(strField = vbNullChar, vbNullString, strField)
                    'strField = Replace$(strField, vbTab, strTextQualifier)
                    'strField = IIf(strField = vbTab, vbNullString, strField)
                    'strField = strField.Replace(vbTab, strTextQualifier)

                    'save field to array..
                    strNewLine = String.Format("{0}{1}{2}", strNewLine, strQualifierSplitter, strField)

                Loop

                If bEmptyLast = True Then
                    strNewLine = String.Format("{0}{1}", strNewLine, strQualifierSplitter)
                End If

                'trim off first nullchar..
                strNewLine = strNewLine.Substring(1)

                'split new line..
                strList = strNewLine.Split(strQualifierSplitter)
            Else
                If strLine.Substring(strLine.Length - 1, 1) = strDataDelimiter Then
                    strLine = strLine.Substring(0)
                End If
                'no qualifier.. do a simply split..
                strList = strLine.Split(strDataDelimiter)
            End If

            'return result..
            Return strList

        Catch ex As Exception
            Throw New Exception(String.Format("Error Splitting Special String - {0}", ex.Message.ToString()))
        End Try
    End Function

如果您想在“”之前顯示下划線以表示空格,則可以使用以下命令:

string[] splitString = t.Replace(" \"", "_\"").Split('"');

當您開始添加各種異常時,此類事情的正則表達式會很快變得復雜。

盡管如此,如果出於興趣和完整性的考慮,比其他任何事情都多:

(?<term>[a-zA-Z0-9'.=]+)|("(?<term>[^"]+)")

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM