简体   繁体   中英

Split string on several words, and track which word split it where

I am trying to split a long string based on an array of words. For Example:

Words: trying, long, array

Sentence: "I am trying to split a long string based on an array of words."

Resulting string array:

  • I am
  • trying
  • to split a
  • long
  • string based on an
  • array
  • of words

Multiple instances of the same word is likely, so having two instances of trying cause a split, or of array, will probably happen.

Is there an easy way to do this in .NET?

The easiest way to keep the delimiters in the result is to use the Regex.Split method and construct a pattern using alternation in a group . The group is key to including the delimiters as part of the result, otherwise it will drop them. The pattern would look like (word1|word2|wordN) and the parentheses are for grouping. Also, you should always escape each word, using the Regex.Escape method , to avoid having them incorrectly interpreted as regex metacharacters.

I also recommend reading my answer (and answers of others) to a similar question for further details: How do I split a string by strings and include the delimiters using .NET?

Since I answered that question in C#, here's a VB.NET version:

Dim input As String = "I am trying to split a long string based on an array of words."
Dim words As String() = { "trying", "long", "array" }

If (words.Length > 0)
    Dim pattern As String = "(" + String.Join("|", words.Select(Function(s) Regex.Escape(s)).ToArray()) + ")"
    Dim result As String() = Regex.Split(input, pattern)

    For Each s As String in result
        Console.WriteLine(s)
    Next
Else
    ' nothing to split '
    Console.WriteLine(input)
End If

If you need to trim the spaces around each word being split you can prefix and suffix \\s* to the pattern to match surrounding whitespace:

Dim pattern As String = "\s*(" + String.Join("|", words.Select(Function(s) Regex.Escape(s)).ToArray()) + ")\s*"

If you're using .NET 4.0 you can drop the ToArray() call inside the String.Join method.

EDIT: BTW, you need to decide up front how you want the split to work. Should it match individual words or words that are a substring of other words? For example, if your input had the word "belong" in it, the above solution would split on "long", resulting in {"be", "long"} . Is that desired? If not, then a minor change to the pattern will ensure the split matches complete words. This is accomplished by surrounding the pattern with a word-boundary \\b metacharacter:

Dim pattern As String = "\s*\b(" + String.Join("|", words.Select(Function(s) Regex.Escape(s)).ToArray()) + ")\b\s*"

The \\s* is optional per my earlier mention about trimming.

You could use a regular expression.

(.*?)((?:trying)|(?:long)|(?:array))(.*)

will give you three groups if it matches:

  • 1) The bit before the first instance of any of the split words.
  • 2) The split word itself.
  • 3) The rest of the string.

You can keep matching on (3) until you run out of matches.

I've played around with this but I can't get a single regex that will split on all instances of the target words. Maybe someone with more regex-fu can explain how.

I've assumed that VB has regex support. If not, I'd recommend using a different language. Certainly C# has regexes.

Peter, I hope the below would be suitable for Split string by array of words using Regex

// Input
String input = "insert into tbl1 inserttbl2 insert into tbl2 update into tbl3 
updatededle into tbl4 update into tbl5";

//Regex Exp
String[] arrResult = Regex.Split(input, @"\s+(?=(?:insert|update|delete)\s+)",
RegexOptions.IgnoreCase);

//Output
[0]: "insert into tbl1 inserttbl2"
[1]: "insert into tbl2"
[2]: "update into tbl3 updatededle into tbl4"
[3]: "update into tbl5" 

您可以使用“”进行拆分,然后遍历单词并查看“拆分单词”数组中包含的单词

    Dim testS As String = "I am trying to split a long string based on an array of words."

    Dim splitON() As String = New String() {"trying", "long", "array"}

    Dim newA() As String = testS.Split(splitON, StringSplitOptions.RemoveEmptyEntries)

Something like this

    Dim testS As String = "I am trying to split a long string based on a long array of words."

    Dim splitON() As String = New String() {"long", "trying", "array"}

    Dim result As New List(Of String)
    result.Add(testS)

    For Each spltr As String In splitON
        Dim NewResult As New List(Of String)
        For Each s As String In result
            Dim a() As String = Strings.Split(s, spltr)
            If a.Length <> 0 Then
                For z As Integer = 0 To a.Length - 1
                    If a(z).Trim <> "" Then NewResult.Add(a(z).Trim)
                    NewResult.Add(spltr)
                Next
                NewResult.RemoveAt(NewResult.Count - 1)
            End If
        Next
        result = New List(Of String)
        result.AddRange(NewResult)
    Next

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM