简体   繁体   English

复杂的LINQ to XML查询帮助

[英]Complex LINQ to XML query assistance

I don't know if the query I am trying to do is even possible but if one of you LINQ to SQL/XML guru's can figure this out I will be so thankful and salute you as a LINQ God. 我不知道我尝试执行的查询是否可行,但是如果你们中的LINQ to SQL / XML大师中的一位能弄清楚这一点,我将非常感谢并向LINQ神致敬。 My end goal is to identify all of the XML Models that are duplicates and show the CECID for all the duplicates except one. 我的最终目标是识别所有重复的XML模型,并显示除一个以外的所有重复的CECID。 So lets say I have an Xdocument that looks like this: 所以可以说我有一个看起来像这样的Xdocument:

<ApplianceModels xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" ApplianceType="IceMakers">
    <Model>
        <ReferenceNumber>201877149</ReferenceNumber>
        <Action>C</Action>
        <Brand>4564</Brand>
        <ModelNumber>1234212</ModelNumber>
        <EquipmentType>A</EquipmentType>
        <CoolingType>W</CoolingType>
        <IceType>C</IceType>
        <IceMakerProcessType>B</IceMakerProcessType>
        <TestLabCode>ARN3190</TestLabCode>
        <ManufacturerCode>ARN2396</ManufacturerCode>
        <HarvestRateLbs24Hr>56</HarvestRateLbs24Hr>
        <EnergyCons_kWhPer100Lbs>4.00</EnergyCons_kWhPer100Lbs>
        <WaterCons_galPer100Lbs>12</WaterCons_galPer100Lbs>
        <IceHardnessAdjustmentFactor xsi:nil="true" />
        <RegulatoryStatus>I</RegulatoryStatus>
        <CECID>d579ae7a-f3f7-4627-a3f1-f17b23aa28e3</CECID>
    </Model>
    <Model>
        <ReferenceNumber>201877143</ReferenceNumber>
        <Action>C</Action>
        <Brand>4564</Brand>
        <ModelNumber>12342</ModelNumber>
        <EquipmentType>A</EquipmentType>
        <CoolingType>W</CoolingType>
        <IceType>C</IceType>
        <IceMakerProcessType>B</IceMakerProcessType>
        <TestLabCode>ARN3190</TestLabCode>
        <ManufacturerCode>ARN2396</ManufacturerCode>
        <HarvestRateLbs24Hr>56</HarvestRateLbs24Hr>
        <EnergyCons_kWhPer100Lbs>4.00</EnergyCons_kWhPer100Lbs>
        <WaterCons_galPer100Lbs>12</WaterCons_galPer100Lbs>
        <IceHardnessAdjustmentFactor xsi:nil="true" />
        <RegulatoryStatus>I</RegulatoryStatus>
        <CECID>94c6d6e6-5b6a-4f45-a7ff-70a64e50e4e6</CECID>
    </Model>
    <Model>
        <ReferenceNumber>201877152</ReferenceNumber>
        <Action>C</Action>
        <Brand>4564</Brand>
        <ModelNumber>1231114234</ModelNumber>
        <EquipmentType>A</EquipmentType>
        <CoolingType>W</CoolingType>
        <IceType>C</IceType>
        <IceMakerProcessType>C</IceMakerProcessType>
        <TestLabCode>ARN3190</TestLabCode>
        <ManufacturerCode>ARN2396</ManufacturerCode>
        <HarvestRateLbs24Hr>81</HarvestRateLbs24Hr>
        <EnergyCons_kWhPer100Lbs>1.10</EnergyCons_kWhPer100Lbs>
        <WaterCons_galPer100Lbs>12</WaterCons_galPer100Lbs>
        <IceHardnessAdjustmentFactor>4.45</IceHardnessAdjustmentFactor>
        <RegulatoryStatus>I</RegulatoryStatus>
        <CECID>d97a603c-1836-43a3-b564-ab8d1bdec65f</CECID>
    </Model>
</ApplianceModels>

Then in SQL Server I have a table called tApplianceTypeColumns that looks like this for a given appliance type: 然后在SQL Server中,有一个名为tApplianceTypeColumns的表,对于给定的设备类型,该表如下所示:

ApplianceTypeID       ApplianceColumnUnique        ApplianceColumnName
10                    0                            ReferenceNumber
10                    1                            Brand
10                    1                            ModelNumber
10                    0                            EquipmentType
10                    0                            CoolingType
10                    0                            IceType
10                    0                            IceMakerProcessType
10                    0                            HarvestRateLbs24Hr
10                    0                            EnergyCons_kWhPer100Lbs
10                    0                            WaterCons_galPer100lbs
10                    1                            RegulatoryStatus

So here is what I started with but I am far from being close: 所以这是我开始的,但是距离还很远:

var DupeItems = from m in doc.Descendants("Model").Elements()
                join at in entities.tApplianceTypeColumns on m.Name equals at.ApplianceColumnName
                group m by m.Element(at.ApplianceColumnName).Value into d
                where at.ApplianceTypeID == ApplianceTypeID

So really I want to be able to group by Brand, Model Number, and RegulatoryStatus which are the columns in the tApplianceTypeColumns table that have the ApplianceColumnUnique bit column set to true. 因此,我真的希望能够按品牌,型号和RegulatoryStatus分组,这些是tApplianceTypeColumns表中将ApplianceColumnUnique位列设置为true的列。 The number of true bits could vary depending on the ApplianceTypeID I am looking up in that table. 真实位数可能会有所不同,具体取决于我在该表中查找的ApplianceTypeID。

Additionally, I also need to include two elements in the grouping that are never in the tApplianceTypeColumns table and those elements are Action then ManufacturerCode followed by all the other unique elements from the tApplianceTypeColumns in no specific order. 此外,我还需要在分组中包括两个永远不在tApplianceTypeColumns表中的元素,这些元素是Action然后是ManufacturerCode,然后是tApplianceTypeColumns中的所有其他唯一元素(没有特定顺序)。

The ApplianceTypeID is a known parameter that will be passed to the query. ApplianceTypeID是一个已知参数,将传递给查询。 So for any set of duplicates I need to display the CECID for the 2nd and subsequent duplicates so that I can take those CECID's and do lookups in other tables to change their status. 因此,对于任何重复项集,我需要显示第二个及后续重复项的CECID,以便我可以获取这些CECID并在其他表中进行查找以更改其状态。 But this first step is tough. 但是第一步很难。 I don't care which of the duplicates does not get displayed. 我不在乎哪些重复项不会显示。 I just need to display all others except 1. I hope I have explained this well enough. 我只需要显示除1以外的所有其他内容。我希望我已经对此进行了充分的解释。

The task can be split into 3 steps: 该任务可以分为3个步骤:

  1. Find the unique columns to group with: 查找要分组的唯一列:

    So really I want to be able to group by Brand, Model Number, and RegulatoryStatus which are the columns in the tApplianceTypeColumns table that have the ApplianceColumnUnique bit column set to true. 因此,我真的希望能够按品牌,型号和RegulatoryStatus分组,这些是tApplianceTypeColumns表中将ApplianceColumnUnique位列设置为true的列。 The number of true bits could vary depending on the ApplianceTypeID I am looking up in that table. 真实位数可能会有所不同,具体取决于我在该表中查找的ApplianceTypeID。 Additionally, I also need to include two elements in the grouping that are never in the tApplianceTypeColumns table and those elements are Action then ManufacturerCode followed by all the other unique elements from the tApplianceTypeColumns in no specific order. 此外, 我还需要在分组包括两个永远不在tApplianceTypeColumns表中的元素,这些元素是Action然后是ManufacturerCode然后是tApplianceTypeColumns中的所有其他唯一元素(没有特定顺序)。

     Enumerable.Concat( "Action,ManufacturerCode".Split(','), applianceTypeColumns .Where(at => at.ApplianceColumnUnique) .Select(at => at.ApplianceColumnName) ); 
  2. Group the models by the columns from prevous step: 通过上一步中的列将模型分组:

    We project the column names into the column values of each model 我们将列名投影到每个模型的列值中

     applianceModels.GroupBy( model => uniqueColumns.Select(columnName => model.Element(columnName)?.Value).ToArray() 

    However, we can't just group by an array of string, so we need to provider a custom IEqualityComparer: 但是,我们不能仅按字符串数组进行分组,因此我们需要提供一个自定义IEqualityComparer:

     new LambdaComparer<string[]>((a, b) => a.SequenceEqual(b), x => x.Aggregate(13, (hash, y) => hash * 7 + y?.GetHashCode() ?? 0)) 
  3. Aggregate the duplicates: 汇总重复项:

     .Select(g => new { g.Key, Duplicates = g.Select(x => x.Element("CECID")?.Value) }) 

Everything put together: 一切放在一起:

void Main()
{
    const int ApplianceTypeID = 10;

    var applianceModels = GetApplianceModels().XPathSelectElements("Model"); //.Dump();
    var applianceTypeColumns = GetApplianceTypeColumns().Where(x => x.ApplianceTypeID == ApplianceTypeID); //.Dump();

    var uniqueColumns = Enumerable.Concat(
        "Action,ManufacturerCode".Split(','),
        applianceTypeColumns
            .Where(at => at.ApplianceColumnUnique)
            .Select(at => at.ApplianceColumnName)
    );

    var query = applianceModels
        .GroupBy(
            model => uniqueColumns.Select(columnName => model.Element(columnName)?.Value).ToArray(),
            new LambdaComparer<string[]>((a, b) => a.SequenceEqual(b), x => x.Aggregate(13, (hash, y) => hash * 7 + y?.GetHashCode() ?? 0))
        )
        .Where(x => x.Count() > 1)
        .Select(g => new { g.Key, Duplicates = g.Select(x => x.Element("CECID")?.Value) });
        //.Dump();
}

// Define other methods and classes here
XElement GetApplianceModels()
{
    return XElement.Parse(
@"<ApplianceModels xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance"" xmlns:xsd=""http://www.w3.org/2001/XMLSchema"" ApplianceType=""IceMakers"">
    <Model>
        <ReferenceNumber>201877149</ReferenceNumber>
        <Action>C</Action>
        <Brand>4564</Brand>
        <ModelNumber>1234212</ModelNumber>
        <EquipmentType>A</EquipmentType>
        <CoolingType>W</CoolingType>
        <IceType>C</IceType>
        <IceMakerProcessType>B</IceMakerProcessType>
        <TestLabCode>ARN3190</TestLabCode>
        <ManufacturerCode>ARN2396</ManufacturerCode>
        <HarvestRateLbs24Hr>56</HarvestRateLbs24Hr>
        <EnergyCons_kWhPer100Lbs>4.00</EnergyCons_kWhPer100Lbs>
        <WaterCons_galPer100Lbs>12</WaterCons_galPer100Lbs>
        <IceHardnessAdjustmentFactor xsi:nil=""true"" />
        <RegulatoryStatus>I</RegulatoryStatus>
        <CECID>d579ae7a-f3f7-4627-a3f1-f17b23aa28e3</CECID>
    </Model>
    <Model>
        <ReferenceNumber>201877143</ReferenceNumber>
        <Action>C</Action>
        <Brand>4564</Brand>
        <ModelNumber>12342</ModelNumber>
        <EquipmentType>A</EquipmentType>
        <CoolingType>W</CoolingType>
        <IceType>C</IceType>
        <IceMakerProcessType>B</IceMakerProcessType>
        <TestLabCode>ARN3190</TestLabCode>
        <ManufacturerCode>ARN2396</ManufacturerCode>
        <HarvestRateLbs24Hr>56</HarvestRateLbs24Hr>
        <EnergyCons_kWhPer100Lbs>4.00</EnergyCons_kWhPer100Lbs>
        <WaterCons_galPer100Lbs>12</WaterCons_galPer100Lbs>
        <IceHardnessAdjustmentFactor xsi:nil=""true"" />
        <RegulatoryStatus>I</RegulatoryStatus>
        <CECID>94c6d6e6-5b6a-4f45-a7ff-70a64e50e4e6</CECID>
    </Model>
    <Model>
        <ReferenceNumber>201877152</ReferenceNumber>
        <Action>C</Action>
        <Brand>4564</Brand>
        <ModelNumber>1231114234</ModelNumber>
        <EquipmentType>A</EquipmentType>
        <CoolingType>W</CoolingType>
        <IceType>C</IceType>
        <IceMakerProcessType>C</IceMakerProcessType>
        <TestLabCode>ARN3190</TestLabCode>
        <ManufacturerCode>ARN2396</ManufacturerCode>
        <HarvestRateLbs24Hr>81</HarvestRateLbs24Hr>
        <EnergyCons_kWhPer100Lbs>1.10</EnergyCons_kWhPer100Lbs>
        <WaterCons_galPer100Lbs>12</WaterCons_galPer100Lbs>
        <IceHardnessAdjustmentFactor>4.45</IceHardnessAdjustmentFactor>
        <RegulatoryStatus>I</RegulatoryStatus>
        <CECID>d97a603c-1836-43a3-b564-ab8d1bdec65f</CECID>
    </Model>
</ApplianceModels>");
}
IEnumerable<(int ApplianceTypeID, bool ApplianceColumnUnique, string ApplianceColumnName)> GetApplianceTypeColumns()
{
    var data =
@"ApplianceTypeID       ApplianceColumnUnique        ApplianceColumnName
10                    0                            ReferenceNumber
10                    1                            Brand
10                    1                            ModelNumber
10                    0                            EquipmentType
10                    0                            CoolingType
10                    0                            IceType
10                    0                            IceMakerProcessType
10                    0                            HarvestRateLbs24Hr
10                    0                            EnergyCons_kWhPer100Lbs
10                    0                            WaterCons_galPer100lbs
10                    1                            RegulatoryStatus";
    return Regex.Matches(data, @"^(\d+)\s+(\d+)\s+(\w+)", RegexOptions.Multiline)
        .Cast<Match>()
        .Select(x => 
        (
            /*ApplianceTypeID = */int.Parse(x.Groups[1].Value),
            /*ApplianceColumnUnique = */int.Parse(x.Groups[2].Value) != 0,
            /*ApplianceColumnName = */x.Groups[3].Value
        ));
}

class LambdaComparer<T> : IEqualityComparer<T>
{
    private readonly Func<T, T, bool> equals;
    private readonly Func<T, int> getHashCode;

    public LambdaComparer(Func<T, T, bool> equals, Func<T, int> getHashCode)
    {
        this.equals = equals;
        this.getHashCode = getHashCode;
    }

    public bool Equals(T x, T y) => equals(x, y);
    public int GetHashCode(T obj) => getHashCode(obj);
}

Here is my final code based on Xiaoy312's solution. 这是基于Xiaoy312解决方案的最终代码。 Thank you again. 再次感谢你。 It works well. 它运作良好。 I salute you as a LINQ God: 我向LINQ神致敬:

private List<string> XMLDuplicatesToEliminate(XDocument doc, Guid ApplianceTypeID)
{
    var entities = new DbContextFactory().MAEDBSEntities;

    var applianceModels = doc.Descendants("Model");
    var applianceTypeColumns =
    (
        from at in entities.tApplianceTypeColumns
        where
            at.ApplianceTypeID == ApplianceTypeID &&
            at.ApplianceColumnUnique == true
        select new { at.ApplianceColumnName }
    ).ToList();

    var uniqueColumns = Enumerable.Concat(
        "Action,ManufacturerCode".Split(','),
        applianceTypeColumns
            .Select(at => at.ApplianceColumnName)
    );

    List<string> DuplicatesToEliminate = new List<string>();
    var duplicates = applianceModels
        .GroupBy(
            model => uniqueColumns.Select(columnName => model.Element(columnName)?.Value).ToArray(),
            new LambdaComparer<string[]>((a, b) => a.SequenceEqual(b), x => x.Aggregate(13, (hash, y) => hash * 7 + y?.GetHashCode() ?? 0)))
        .Where(x => x.Count() > 1)
        .Select(g => new { g.Key, Duplicates = g.Select(x => x.Element("CECID")?.Value) })
        .ToList();

    foreach (var duperow in duplicates)
    {
        string firstdupe = duperow.Duplicates.First();
        IEnumerable<string> allbutone = duperow.Duplicates.Where(x => x != firstdupe);
        foreach (string dupeitem in allbutone)
        {
            DuplicatesToEliminate.Add(dupeitem);
        }
    }

    return DuplicatesToEliminate;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM