简体   繁体   English

使用LINQ分组列表条目

[英]Group list entries with LINQ

I have the following model: 我有以下型号:

public class Entry
{
    public int UseraccountId { get; set; }
    public int CompanyId { get; set; }
    public DateTime CreationDate { get; set; }
    public string Target { get; set; }
    public string Message { get; set; }
}

And a list with a lot of entries: 以及包含大量条目的列表:

List<Entry> entries = ... //get all entries.

Example: 例:

分组前的示例

I'd now like row 2 and 3 to be grouped because they have the same UserId, same CompanyId, same target and almost (and this is the difficult part), let's say in a range of 5 seconds, the same date time. 我现在想要对第2行和第3行进行分组,因为它们具有相同的UserId,相同的CompanyId,相同的目标和几乎 (这是困难的部分),假设在5秒的范围内,相同的日期时间。

After grouping my list should look like this: 分组后,我的列表应如下所示:

在此输入图像描述

Is there any easy approach for this problem? 这个问题有什么简单的方法吗? Any advices? 有什么建议吗? I bet Linq will help me around but I'm not sure how. 我打赌Linq会帮助我,但我不确定如何。

Edit: Thank you all for your feedback. 编辑:谢谢大家的反馈。 I decided to change the design and to ensure that the datetime is now really the same. 我决定改变设计并确保日期时间现在真的相同。 So grouping with linq is now very easy. 因此,使用linq进行分组现在非常容易。

As @dtb menitons, grouping by "close" is difficult because you can end up with a bigger "bucket" than you intended. 作为@dtb menitons,按“关闭”进行分组很困难,因为你最终会得到一个比你想象的更大的“桶”。 For example, If you have 100 entries that are created 4 seconds apart from each other, grouping items that are within 5 seconds of the "next" item would put all of them in one bucket! 例如,如果您有100个彼此间隔4秒创建的条目,则对“下一个”项目的5秒内的项目进行分组会将所有条目放在一个存储桶中!

If, however, you want to round the creating date to the nearest, say, 5 seconds and then group, you could use: 但是,如果要将创建日期四舍五入到最近,比如5秒然后分组,则可以使用:

TimeSpan ts = new TimeSpan(0, 0, 5);  // 5 seconds
entries.GroupBy(i => new {
                          UserId = i.UserId, 
                          CompanyId = i.CompanyId, 
                          Target = i.Target, 
                          RoundedTime = DateTime.MinValue.AddTicks(
                                            (long)(Math.Round((decimal)i.CreationDate.Ticks / ts.Ticks) * ts.Ticks)
                                        ) ;
                          ))
       .Select(g => new {
                         UserId = g.Key.UserId, 
                         CompanyId = g.Key.CompanyId, 
                         Target = g.Key.Target, 
                         RoundedTime = g.Key.RoundedTime,
                         Message = string.Join(", ",g.Select(i=> i.Message).ToArray())
                        } );

That will group by items that are rounded to the nearest 5 seconds - it's possible that two items one second apart will be in different buckets, but you don't have the problem with cummutativity that your stated requirement has. 这将按舍入到最接近的5秒的项目进行分组 - 相隔一秒钟的两个项目可能会在不同的桶中,但是您没有遇到您所声明要求的累积性问题。

This would give a range of -5 seconds, but for a full matching (clustering) algorithm on the CreationDate, I'm guessing it'll be far more difficult. 这将给出-5秒的范围,但是对于CreationDate上的完全匹配(聚类)算法,我猜它会变得更加困难。 You get the idea, though. 但是你明白了。

List<Entry> entries = entries.GroupBy(a => a.UserId)
                             .ThenBy(a => a.CompanyId)
                             .ThenBy(a => a.CreationDate.AddSeconds(-5));

There is no straightforward answer as this depends on what you consider to be a match. 没有直截了当的答案,因为这取决于您认为匹配的内容。 There are simple as well as complicated approaches and anywhere in between. 有简单和复杂的方法,介于两者之间。 You'll need to come up with the algorithm for that. 你需要为此提出算法。 An easy approach would be to shave off the seconds and just match down to the minute however that may be too long. 一个简单的方法是削减秒数,然后匹配到分钟,但这可能太长。 You could write a method normalizes timestamps down to 5 or 10 seconds and group on that as suggested already. 您可以编写一个方法将时间戳标准化为5或10秒,并按照建议将其分组。

If you want to group any two messages that are within x seconds together then this approach will only mostly work. 如果要将任何两个x秒内的消息组合在一起,那么这种方法只会起作用。 There will always be those values that are within the range but fall on either side of the cutoff. 总会有那些在该范围内但落在截止值两侧的值。 If you are OK with this and value simplicity, then the above answer will work. 如果你对这个很简单,那么上面的答案就可以了。

If that doesn't work and you want to group across the artificial cutoff then you'll need another approach. 如果这不起作用并且你想要在人工截止中进行分组,那么你将需要另一种方法。 A simple approach in this case could be that you use LINQ to group by everything but the timestamp. 在这种情况下,一个简单的方法可能是您使用LINQ除了时间戳之外的所有内容。 This will do a preliminary grouping of your data. 这将对您的数据进行初步分组。 Then you can iterate through each group and compare each time value to each other time value in the same group and determine if it is within your range. 然后,您可以遍历每个组,并将每个时间值与同一组中的每个其他时间值进行比较,并确定它是否在您的范围内。 Then manually grab those values that fall within the specified range and group them together. 然后手动获取属于指定范围的值并将它们组合在一起。

This has an additional edge case that you'll need to make a decision on. 这有一个额外的边缘情况,你需要做出决定。 If you decide that you will group within 1 second and you have three entries whose seconds are (simplified) 1, 2, and 3. 1 and 2 are within a second and 2 and 3 are also within a second but 1 and 3 aren't. 如果您决定在1秒内分组并且您有三个条目,其秒数(简化)为1,2和3. 1和2在一秒内,2和3也在一秒内但1和3不是'吨。 Would you group these based on 2 being within one second of the others, or would you group 1 and 2, making 2 ineligible to be grouped with 3 and 3 would be by itself. 你是否会根据2在其他人的一秒之内对这些进行分组,或者你是否将1和2分组,使得2不符合3和3的组合将是单独的。

You'll ultimately wind up with a solution who's bucket's can grow based on chaining values together, or a different artificial cutoff based on first group created rather than hard time cutoffs. 你最终会得到一个解决方案,他们可以根据链接值一起增长,或者基于第一组创建的不同人工截止而不是硬时间截止。 The hard time is far more simple so unless you are going to have growing buckets, I'd recommend just going with a normalized timestamp and grouping on that. 困难的时间要简单得多,所以除非你要增加桶,否则我建议你选择一个标准化的时间戳并对其进行分组。

You need to define what you mean by almost and plan accordingly. 您需要几乎定义您的意思并相应地制定计划。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM