简体   繁体   中英

Is there an elegant LINQ solution for SomeButNotAll()?

Here is what I'm trying to do overall. Just to be clear, this isn't homework or for a contest or anything. Hopefully, I've made the wording clear enough:

Problem

Given a set of strings in the same format, but where some end in a lowercase letter and some do not, return a set of one of each string that does not end in a lowercase letter, but that has at least one identical string ending in a lowercase letter.

Example

To keep it simple, let's say the string format is \\d+[az]? , where the common part is the number. Given {1, 4, 3a, 1b, 3, 6c} , I should receive a permutation of {1, 3} because 1 and 3 have both an element with and without a lowercase letter at the end.

Alternate Solution

You can view this solution here .

One way I thought of doing this was to partition the set into elements with and without a lowercase letter suffix ( {1, 4, 3} and {3a, 1b, 6c} ), and then return withoutSuffix.Where(x => withSuffix.Any(y => y.StartsWith(x))) .

I have two problems with this:

  1. I don't see a good way to partition into the two sets, with the predicate Regex.IsMatch(input, "[az]$") . The two I thought of were two similarly-defined variables each using a Where clause and doing the regex matching twice per element, or transforming the set to store the regex match results and then forming the two variables from that. group...by doesn't seem to play well when you need to access both sets like this, but I could be wrong there.

  2. Although the size is small enough not to care about performance, going through withSuffix once per withoutSuffix element seems inelegant.

Relevant Solution

You can view this solution here .

The other way that came to mind was to grab the common prefix and the optional suffix: {1 => {"", b}, 3 => {a, ""}, 4 => {""}, 6 => {c}} . This is easily accomplished by capturing the prefix and the suffix with a regex ( (\\d+)([az])? ) and grouping the suffix by the prefix into grouped .

From here, it would be great to do:

where grouped.SomeButNotAll(x => x == string.Empty)
select grouped.Key

Or even:

where grouped.ContainsSomeButNotAll(string.Empty)
select grouped.Key

I could certainly create either of these, but unfortunately, the best I can see in LINQ is:

where grouped.Contains(string.Empty) && grouped.Any(x => x != string.Empty)
select grouped.Key

It just feels super redundant. Is there anything better than this in LINQ already?

PS I'm open to better approaches to solving the overall problem instead of making this an XY problem. Elegance is desired much more than performance, but (maybe it's just me) being plain wasteful still seems inelegant.

I don't think you really need regexen for this. Here's how I would do it:

var withEndings = new HashSet<string>();
var withoutEndings = new HashSet<string>();

foreach (var s in input)
    if(char.IsLower(s[s.Length - 1])) 
        withEndings.Add(s.Substring(0, s.Length - 1));
    else
        withoutEndings.Add(s);

var result = withEndings.Intersect(withoutEndings);

You could add another grouping by string.IsNullOrEmpty and verify that it has 2 groups (one for false and one for true ):

return
    from str in strs
    let match = Regex.Match(str, STR_FORMAT)
    group match.Groups[2].Value by match.Groups[1].Value into parts
    where (parts.GroupBy(string.IsNullOrEmpty).Count() == 2)
    select parts.Key;

In this case I think the most performant solution is not necessarily very elegant in LINQ. I think this should do what you want and do it O(N) run time.

values
.Aggregate(
new { HashSet1 = new HashSet<string>(), HashSet2 = new HashSet<string>() },
(a, x) =>
{
    // If the last character is a lowercase letter then put the string
    // (minus the last character) in HashSet1, otherwise, put the string
    // in HashSet2
    if(Char.IsLower(x, x.Length - 1))
    {
        a.HashSet1.Add(x.Substring(0, x.Length - 1));
    }
    else
    {
        a.HashSet2.Add(x);
    }
    return a;
},
a => 
{
    // Return all the strings that are present in both hash sets.
    return 
    a
    .HashSet1
    .Where(x => a.HashSet2.Contains(x));
});

.Where() on each element, and for each one, .Where() on each element again , making sure at least one fits the regex pattern of original element plus any one lowercase letter.

var input = new List<string>() { "1", "4", "3a", "1b", "3", "6c" };
var output = input.Where(
    x => input.Where(
        y => Regex.Match(y, "^" + Regex.Escape(x) + "[a-z]$").Success
    ).Any()
);

output contains { "1", "3" } .

You can change .Any() to !All() .

I would prefer using the Count overload with a predicate and comparing to the overall count. That will probably be the cleanest and you won't have to worry about exceptions stemming from empty collections.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM