简体   繁体   中英

LINQ query returning the entire object grouped by multiple columns when one is distinct

I have a quandry I am trying to solve with LINQ but I haven't got a working solution yet.

I have a list of businesses coming back that contain large amounts of data. I need to preserve all this data so I have access to it while grouping it and eliminating certain duplicates.

So the main properties I am interested in grouping by would be Address1, Address2 and BusinessName.

I want to group first by business name, then by address 1 and then by address 2, but ONLY when address 2 is distinct. The reason for this is I might have multiple ways of having the same address written and usually this relates to address2 being written differently which is fine, we will support that for right now and if its written as Suite 200 or Ste 200 it will be treated differently. This is needed to make sure we don't eliminate actual differences in the case of multiple offices being located in the same building(ie Ste 200 and Ste 100 are both the same business with different offices). However, I don't want to return the same addresses with the same list of address2's.

var myNonDupOfficeList = officeList
    .GroupBy(o => new { o.Address1, o.Address2, o.BusinessName})
    .OrderBy(g => g.Key.BusinessName).ThenBy(g => g.Key.Address1).ThenBy(g => g.Key.Address2)
    .Select(o => o.FirstOrDefault()).ToList();

The code I already have written will do this fine, but the issue is I lose all the other data I need. If I include that data in the new { } object, then it adds differences that increase the number of groups that I don't want to search by. For instance I add City, State and Zipcode data, but for my intents, that is not relevant...the data isn't always correct with zipcodes so someone entering a wrong zipcode will produce another group for example, or someone putting St Louis or St. Louis or Saint Louis will all be different groups.

City, State and Address are not relevant for how I want to group by, but I NEED access to that data once its been grouped by BusinessName, Address1 and Address2. How can I achieve this using Linq?

I tried this in LinqPad against the Northwind database and I think it does what you are after -

Customers
    .GroupBy(i => new { i.Country, i.City})
    .OrderBy(i => i.Key.City)
    .ThenBy(i => i.Key.Country)
    .Select(i => new { Row = i.FirstOrDefault(), Cnt = i.Count()})
    .Dump();

I included a count so I could see how many items were in each group.

While its a little more work up front, the best idea would be to create a type containing just the fields you want and create a new instance of that type when you perform your initial query.

public class MyBusiness
{
    public string BusinessName { get; set; }
    public string BusinessAddress1 { get; set; }
    public string BusinessAddress2 { get; set; }
}

then

 var myNonDupOfficeList = officeList
.GroupBy(o => new { o.Address1, o.Address2, o.BusinessName })
.OrderBy(g => g.Key.BusinessName).ThenBy(g => g.Key.Address1).ThenBy(g => g.Key.Address2)
.Select(o => new MyBusiness
{
    BusinessName = o.BusinessName,
    BusinessAddress1 = o.Address1,
    BusinessAddress2 = o.Address2
}).ToList();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM