简体   繁体   中英

How to detect certain symbols in a string and use them to mark special words?

I would like to detect certain symbols in a string and use them to mark special words.

Here's and example:

Symbols to mark special words: Action - ѧ; Job - Ꝭ; Location - լ; Number - ₦;

Example: ₦7₦ hundred ꝬclerkꝬs have said “լOhioլ is ѧdyingѧ”. We are ₦100₦% sure about this.

each special word needs to be put in a corresponding array.

Haven't really made anything like this in the past.

This LINQ can probably be optimized for better complexity but here is a way to put it into a Dictionary

string s = "₦7₦ hundred ꝬclerkꝬs have said “լOhioլ is ѧdyingѧ”. We are ₦100₦% sure about this.";
char[] special = { '₦', 'Ꝭ', 'լ', 'ѧ' };
string[] words = s.Split(" ");

Dictionary<char, string[]> wordDict = special
    .ToDictionary(
        c => c,
        c => words
            .Where(w => Regex.IsMatch(w, $"{c}.+{c}")) //match on flagged char + any length of characters + flagged char
            .Select(w => w.Split(c)[1]) //throw out fluff at beginning and end
            .ToArray()
    );

foreach (KeyValuePair<char, string[]> kv in wordDict)
{
    Console.WriteLine($"{kv.Key} - {string.Join(", ", kv.Value)}");
}

/*
Outputs:
₦ - 7, 100
Ꝭ - clerk
լ - Ohio
ѧ - dying
*/

Big idea is to use regex to match words from the sentence (we split the string on " " to get the words), as you can imagine that the special chars "escape" the words. Iterating over the special array allows us to dynamically create the regex patterns with some string interpolation, where the special chars are at the ends while it accepts any number of chars in-between, as denoted by .+ in the pattern

I chose to use Dictionary to allow for easy lookups, that way if you want just one of these arrays then you can do wordDict[₦] or any other key from the special chars. Naturally, you can make it Dictionary<char, List<string>> by swapping out .ToArray() with .ToList() within the value selector lambda

Edit: updating with the Select query to "encrypt" strings into something like "____"

Dictionary<char, string[]> wordDict = special
    .ToDictionary(
        c => c,
        c => words
            .Where(w => Regex.IsMatch(w, $"{c}.+{c}")) //match on flagged char + any length of characters + flagged char
            .Select(w => new string('_', w.Split(c)[1].Length))
            .ToArray()
    );

All that's been altered is the .Select() query which maps the output strings, in this case we are creating new strings with '_' repeated n times, where n = length of the word trimmed of its special char escape pattern

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM