简体   繁体   中英

Search a string for particular characters and extract the words they appear (C#)

Does anyone know how I can search a string for particular characters and extract the actual word they are in? If a word does contain the particular character(s), how can I split the string based on that word? Here's an example of what I'm trying to do. The input sentence (String) is: "We both arrived at the garage this morning" . Then I want to search that string for all occurrences of the characters "ar" . If any word contains those two letters I'd like to split the string based on those words. So in this example, the split string would look like:

Element 1: "We both"
Element 2: "arrived"
Element 3: "at the"
Element 4: "garage"
Element 5: "this morning"```

There is probably a better way however taking a look at this problem i created my own split function.

A quick breakdown of the function goes as follows.

  • Find the first occurrence of the split string , in this case it will be ar .
    • If there are no occurrences, return the input
  • Temporarily remove everything after this occurrence
    • In our first case this would leave us with the string "We both "
  • Find the last occurrence of a space to give us full words only
    • This will give us "We both"
    • If no occurrences are found then we have our final word, just return the string split from the split string
  • Add this to a list
  • Return back to the remaining part of the string "arrived at the garage this morning" and find the next space. This will give us the word which contained the Split String and add it to a list
  • Removing this word the remaining part of the string is "at the garage this morning"
  • Recursively call this function until no more ar s are found
private List<string> SplitOnFullWords(string input, string split)
{
    List<string> result = new List<string>();

    int firstIndexOfSplit = input.IndexOf(split);

    // we have found an occurence of the split string, remove everything after this.
    if (firstIndexOfSplit >= 0)
    {
        string splitString = input.Substring(0, firstIndexOfSplit);

        // Find the last occurance of a space before this index; this will give us all full words before 
        int lastIndexOfSpace = splitString.LastIndexOf(' ');

        // If there are no sapces before this word then just add it and try for more
        if (lastIndexOfSpace >= 0)
        {
            // Add the words before the word containing the splitter string
            result.Add(splitString.Substring(0, lastIndexOfSpace));

            // Add the word contianing the splitter string
            string remainingString = input.Substring(lastIndexOfSpace + 1);
            int firstSpaceAfterWord = remainingString.IndexOf(' ');

            if (firstSpaceAfterWord >= 0)
            {
                result.Add(remainingString.Substring(0, firstSpaceAfterWord));

                // Look for more after the word containing the splitter string
                string finalString = remainingString.Substring(firstSpaceAfterWord + 1);
                result.AddRange(SplitOnFullWords(finalString, split));
            }
            else
            {
                result.Add(remainingString);
            }
        }
        else
        {
            // Add the word contianing the splitter string
            int firstSpaceAfterWord = input.IndexOf(' ');

            if (firstSpaceAfterWord >= 0)
            {
                result.Add(input.Substring(0, firstSpaceAfterWord));

                // Look for more after the word containing the splitter string
                string finalString = input.Substring(firstSpaceAfterWord + 1);
                result.AddRange(SplitOnFullWords(finalString, split));
            }
            else
            {
                result.Add(input);
            }
        }
    }
    else
    {
        // No occurences of the split string, just return the input
        result.Add(input);
    }

    return result;
}

And to use

foreach (string word in SplitOnFullWords(inputWord, "ar"))
    Console.WriteLine(word);

Here's a solution using two regexes. The first one to find the matching words, the second one to split the string on the matching words.

string sentence = "We both arrived at the garage this morning";
string search = "ar";

// word boundary, optional characters, search term, optional characters again, word boundary.
string regex = @"\b\w*(" + search + @")\w*\b";

// find words matching the search term
var foundWords = Regex.Matches(sentence, regex)
    .Cast<Match>()
    .Select(match => match.Value)
    .ToList();

List<string> result = null;
if (foundWords.Count == 0)
{
    // If no words were found, use the original sentence.
    result = new List<string> { sentence };
}
else
{
    // Create a split term containing the found words.
    var splitTerm = string.Join('|', foundWords.Select(w => "(" + w + ")"));

    // Split the sentence on the found words and trim the parts from spaces.
    result = Regex.Split(sentence, splitTerm)
        .Select(part => part.Trim())
        .ToList();
}

foreach (var part in result)
{
    Console.WriteLine(part);
}

Split the sentence into words, and then build list of strings, checking whether each word contains the given characters.

string sentence = "We both arrived at the garage this morning";
string[] words = sentence.Split();
List<string> results = new List<string>();

string s = "";

foreach (string word in words)
{
    if (word.Contains("ar"))
    {
        if (s != "")
        {
            results.Add(s.Trim());
            s = "";
        }
        results.Add(word);
    }
    else
    {
        s += word + " ";
    }
}
if (s != "")
    results.Add(s);

// results contains the desired strings.

Probably not the highest performance way to complete this - but this worked for me.

 static void Main(string[] args)
    {
        // sets variables
        string example = "We both arrived at the garage this morning";
        string searchTerm = "ar";
        var intermediateArray = new List<string>();
        var answerArray = new List<string>();
        var tempText = "";

        //splits on " " to isolate words into list.
        var exampleArray = example.Split(" ");

        //loops through each word in original string
        foreach(var word in exampleArray)
        {
            //if word contains search term, add it to the answer array
            if (word.Contains(searchTerm))
            {
                tempText = "";

                //loops through words that did not contain the search term 
                //and adds them as a single string to the answer array.
                foreach(var message in intermediateArray)
                {   
                    tempText = tempText + message + " ";
                }

                answerArray.Add(tempText);
                answerArray.Add(word);
                intermediateArray.Clear();

            }
            //if word does not include search term, add it to the string 
            //that will later be added.//
            else
            {
                intermediateArray.Add(word);
            }
        }

        // to demonstrate working as intended
        foreach(var text in answerArray)
        {
            Console.WriteLine(text);
        }

    }

This is a bit of a roundabout way, but it will get the job done. I am going to assume that you define a 'word' by strings delimited by spaces.

var line = "We both arrived at the garage this morning";
var keyword = "ar";

Above will give you a list of 'words' in your sentence.

Following is a string list to contain your results, and it's important it has one empty string at the first index.

var resultList = new List<string>() { string.Empty };

var parts = line.Split(' ').ToList();
for (int i = 0; i < parts.Count; i++)
{
    // If the word contains your keyword, add it as a new item in the list.
    // Next add new item that is an empty string.
    if (parts[i].Contains(keyword))
    {
        resultList.Add(parts[i]);
        resultList.Add(string.Empty);
    }
    // Otherwise, add the word to the last item, and then add a space at the end to separate words.
    else
    {
        resultList[resultList.Count - 1] = resultList[resultList.Count - 1] + parts[i] + " ";
    }
}

Above will result in some words containing trailing spaces, so you can trim them off.

for (int i = 0; i < resultList.Count; i++)
{
    if (resultList[i].EndsWith(" "))
        resultList[i] = resultList[i].TrimEnd(new char[] { ' ' });
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM