简体   繁体   English

拆分字符串列表以匹配首字母和最后字母的最佳方法是什么?

[英]What's the best way to split a list of strings to match first and last letters?

I have a long list of words in C#, and I want to find all the words within that list that have the same first and last letters and that have a length of between, say, 5 and 7 characters. 我在C#中有很长的单词列表,我希望找到该列表中所有具有相同的第一个和最后一个字母并且长度介于5到7个字符之间的单词。 For example, the list might have: 例如,列表可能包含:

"wasted was washed washing was washes watched watches wilts with wastes wits washings" “浪费了洗涤洗涤是洗涤看着手表枯萎与废物智慧洗涤”

It would return 它会回来

Length: 5-7, First letter: w, Last letter: d, "wasted, washed, watched" Length: 5-7, First letter: w, Last letter: s, "washes, watches, wilts, wastes" 长度:5-7,第一个字母:w,最后一个字母:d,“浪费,洗,看”长度:5-7,第一个字母:w,最后一个字母:s,“洗,手表,枯萎,浪费”

Then I might change the specification for a length of 3-4 characters which would return 然后我可能会改变规格,长度为3-4个字符,这将返回

Length: 3-4, First letter: w, Last letter: s, "was, wits" 长度:3-4,第一个字母:w,最后一个字母:s,“was,wits”

I found this method of splitting which is really fast, made each item unique, used the length and gave an excellent start: Spliting string into words length-based lists c# 我发现这种分裂方法非常快,使每个项目都是唯一的,使用了长度并给出了一个很好的开始:将字符串分成单词基于长度的列表c#

Is there a way to modify/use that to take account of first and last letters? 有没有办法修改/使用它来考虑到第一个和最后一个字母?

EDIT 编辑

I originally asked about the 'fastest' way because I usually solve problems like this with lots of string arrays (which are slow and involve a lot of code). 我最初问的是“最快”的方式,因为我通常用很多字符串数组解决这样的问题(这很慢并且涉及很多代码)。 LINQ and lookups are new to me, but I can see that the ILookup used in the solution I linked to is amazing in its simplicity and is very fast. LINQ和查找对我来说是新的,但我可以看到我链接到的解决方案中使用的ILookup简单而且非常快。 I don't actually need the minimum processor time. 我实际上并不需要最短的处理器时间。 Any approach that avoids me creating separate arrays for this information would be fantastic. 任何避免我为这些信息创建单独数组的方法都会很棒。

this one liner will give you groups with same first/last letter in your range 这一个班轮将为您的团体提供相同的第一个/最后一个字母

 int min = 5;
 int max = 7;
 var results = str.Split()
                     .Where(s => s.Length >= min && s.Length <= max)
                     .GroupBy(s => new { First = s.First(), Last = s.Last()});
var minLength = 5;
var maxLength = 7;
var firstPart = "w";
var lastPart = "d";

var words = new List<string> { "washed", "wash" }; // so on

var matches = words.Where(w => w.Length >= minLength && w.Length <= maxLength && 
                               w.StartsWith(firstPart) && w.EndsWith(lastPart))
                   .ToList();

for the most part, this should be fast enough, unless you're dealing with tens of thousands of words and worrying about ms. 在大多数情况下,这应该足够快,除非你处理成千上万的单词并担心ms。 then we can look further. 然后我们可以进一步观察。

Just in LINQPad I created this: 就在LINQPad中我创建了这个:

void Main()
{
var words = new []{"wasted", "was", "washed", "washing", "was", "washes", "watched", "watches", "wilts", "with", "wastes", "wits", "washings"};

var firstLetter = "w";
var lastLetter = "d";
var minimumLength = 5;
var maximumLength = 7;

var sortedWords = words.Where(w => w.StartsWith(firstLetter) && w.EndsWith(lastLetter) && w.Length >= minimumLength && w.Length <= maximumLength);
sortedWords.Dump();
}

If that isn't fast enough, I would create a lookup table: 如果这还不够快,我会创建一个查找表:

Dictionary<char, Dictionary<char, List<string>> lookupTable;

and do: 并做:

lookupTable[firstLetter][lastLetter].Where(<check length>)

Here's a method that does exactly what you want. 这是一种完全符合您要求的方法。 You are only given a list of strings and the min/max length, correct? 您只有一个字符串列表和最小/最大长度,对吗? You aren't given the first and last letters to filter on. 您没有获得要过滤的第一个和最后一个字母。 This method processes all the first/last letters in the strings. 此方法处理字符串中的所有第一个/最后一个字母。

private static void ProcessInput(string[] words, int minLength, int maxLength)
{
    var groups = from word in words
                 where word.Length > 0 && word.Length >= minLength && word.Length <= maxLength
                 let key = new Tuple<char, char>(word.First(), word.Last())
                 group word by key into @group
                 orderby Char.ToLowerInvariant(@group.Key.Item1), @group.Key.Item1, Char.ToLowerInvariant(@group.Key.Item2), @group.Key.Item2
                 select @group;
    Console.WriteLine("Length: {0}-{1}", minLength, maxLength);
    foreach (var group in groups)
    {
        Console.WriteLine("First letter: {0}, Last letter: {1}", group.Key.Item1, group.Key.Item2);
        foreach (var word in group)
            Console.WriteLine("\t{0}", word);
    }
}

Just as a quick thought, I have no clue if this would be faster or more efficient than the linq solutions posted, but this could also be done fairly easily with regular expressions. 正如一个快速思考,我不知道这是否比发布的linq解决方案更快或更有效,但这也可以使用正则表达式相当容易地完成。

For example, if you wanted to get 5-7 letter length words that begin with "w" and end with "s", you could use a pattern along the lines of: 例如,如果您想获得以“w”开头并以“s”结尾的5-7个字母长度的单词,您可以使用以下行的模式:

\bw[A-Za-z]{3,5}s\b

(and this could fairly easily be made to be more variable driven - For example, have a variable for first letter, min length, max length, last letter and plug them in to the pattern to replace w, 3, 5 & s) (并且这可以相当容易地变得更加变量驱动 - 例如,为第一个字母,最小长度,最大长度,最后一个字母添加变量并将它们插入模式以替换w,3,5和s)

Them, using the RegEx library, you could then just take your captured groups to be your list. 他们使用RegEx库,然后您可以将捕获的组作为您的列表。

Again, I don't know how this compares efficiency-wise to linq, but I thought it might deserve mention. 同样,我不知道这与linq的效率方面有何比较,但我认为值得一提。

Hope this helps!! 希望这可以帮助!!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM