简体   繁体   中英

Compare two xml and print the difference using LINQ

I am comparing two xml and I have to print the difference. How can I achieve this using LINQ. I know I can use XML diff patch by Microsoft but I prefer to use LINQ . If you have any other idea I will implement that

//First Xml

<Books>
 <book>  
  <id="20504" image="C01" name="C# in Depth">
 </book>  
 <book> 
  <id="20505" image="C02" name="ASP.NET">
 </book> 
 <book> 
  <id="20506" image="C03" name="LINQ in Action ">
 </book> 
 <book> 
  <id="20507" image="C04" name="Architecting Applications">
 </book> 
</Books>

//Second Xml

<Books>
  <book> 
    <id="20504" image="C011" name="C# in Depth">
  </book>
  <book> 
    <id="20505" image="C02" name="ASP.NET 2.0">
  </book>
  <book> 
    <id="20506" image="C03" name="LINQ in Action ">
  </book>
  <book> 
    <id="20508" image="C04" name="Architecting Applications">
  </book>
</Books>

I want to compare this two xml and print result like this.

Issued       Issue Type             IssueInFirst    IssueInSecond

1            image is different      C01              C011
2            name  is different      ASP.NET          ASP.NET 2.0
3            id  is different        20507            20508

Here is the solution:

//sanitised xmls:
string s1 = @"<Books>
                 <book id='20504' image='C01' name='C# in Depth'/>
                 <book id='20505' image='C02' name='ASP.NET'/>
                 <book id='20506' image='C03' name='LINQ in Action '/>
                 <book id='20507' image='C04' name='Architecting Applications'/>
                </Books>";
string s2 = @"<Books>
                  <book id='20504' image='C011' name='C# in Depth'/>
                  <book id='20505' image='C02' name='ASP.NET 2.0'/>
                  <book id='20506' image='C03' name='LINQ in Action '/>
                  <book id='20508' image='C04' name='Architecting Applications'/>
                </Books>";

XDocument xml1 = XDocument.Parse(s1);
XDocument xml2 = XDocument.Parse(s2);

//get cartesian product (i think)
var result1 =   from xmlBooks1 in xml1.Descendants("book")
                from xmlBooks2 in xml2.Descendants("book")
                select new { 
                            book1 = new {
                                        id=xmlBooks1.Attribute("id").Value,
                                        image=xmlBooks1.Attribute("image").Value,
                                        name=xmlBooks1.Attribute("name").Value
                                      }, 
                            book2 = new {
                                        id=xmlBooks2.Attribute("id").Value,
                                        image=xmlBooks2.Attribute("image").Value,
                                        name=xmlBooks2.Attribute("name").Value
                                      } 
                             };

//get every record that has at least one attribute the same, but not all
var result2 = from i in result1
                 where (i.book1.id == i.book2.id 
                        || i.book1.image == i.book2.image 
                        || i.book1.name == i.book2.name) &&
                        !(i.book1.id == i.book2.id 
                        && i.book1.image == i.book2.image 
                        && i.book1.name == i.book2.name) 
                 select i;



foreach (var aa in result2)
{
    //you do the output :D
}

Both linq statements probably could be merged, but I leave that as an exercise for you.

For fun, a general solution to grega g's reading of the problem. To illustrate my objection to this approach, I've introduced a "correct" entry for 'PowerShell in Action'.

string s1 = @"<Books>
     <book id='20504' image='C01' name='C# in Depth'/>
     <book id='20505' image='C02' name='ASP.NET'/>
     <book id='20506' image='C03' name='LINQ in Action '/>
     <book id='20507' image='C04' name='Architecting Applications'/>
     <book id='20508' image='C05' name='PowerShell in Action'/>
    </Books>";
string s2 = @"<Books>
     <book id='20504' image='C011' name='C# in Depth'/>
     <book id='20505' image='C02' name='ASP.NET 2.0'/>
     <book id='20506' image='C03' name='LINQ in Action '/>
     <book id='20508' image='C04' name='Architecting Applications'/>
     <book id='20508' image='C05' name='PowerShell in Action'/>
    </Books>";

XDocument xml1 = XDocument.Parse(s1);
XDocument xml2 = XDocument.Parse(s2);

var res = from b1 in xml1.Descendants("book")
          from b2 in xml2.Descendants("book")
          let issues = from a1 in b1.Attributes()
                       join a2 in b2.Attributes()
                         on a1.Name equals a2.Name
                       select new
                       {
                           Name = a1.Name,
                           Value1 = a1.Value,
                           Value2 = a2.Value
                       }
          where issues.Any(i => i.Value1 == i.Value2)
          from issue in issues
          where issue.Value1 != issue.Value2
          select issue;

Which reports the following:

{ Name = image, Value1 = C01, Value2 = C011 }
{ Name = name, Value1 = ASP.NET, Value2 = ASP.NET 2.0 }
{ Name = id, Value1 = 20507, Value2 = 20508 }
{ Name = image, Value1 = C05, Value2 = C04 }
{ Name = name, Value1 = PowerShell in Action, Value2 = Architecting Applications }

Note that the last two entries are the "conflict" between the 20508 typo and the otherwise correct 20508 entry.

The operation you want here is a Zip to pair up corresponding elements in your two sequences of books. That operator is being added in .NET 4.0 , but we can fake it by using Select to grab the books' indices and joining on that:

var res = from b1 in xml1.Descendants("book")
                         .Select((b, i) => new { b, i })
          join b2 in xml2.Descendants("book")
                         .Select((b, i) => new { b, i })
            on b1.i equals b2.i

We'll then use a second join to compare the values of attributes by name. Note that this is an inner join; if you did want to include attributes missing from one or the other you would have to do quite a bit more work.

          select new
          {
              Row = b1.i,
              Diff = from a1 in b1.b.Attributes()
                     join a2 in b2.b.Attributes()
                       on a1.Name equals a2.Name
                     where a1.Value != a2.Value
                     select new
                     {
                         Name = a1.Name,
                         Value1 = a1.Value,
                         Value2 = a2.Value
                     }
          };

The result will be a nested collection:

foreach (var b in res)
{
    Console.WriteLine("Row {0}: ", b.Row);
    foreach (var d in b.Diff)
        Console.WriteLine(d);
}

Or to get multiple rows per book:

var report = from r in res
             from d in r.Diff
             select new { r.Row, Diff = d };

foreach (var d in report)
    Console.WriteLine(d);

Which reports the following:

{ Row = 0, Diff = { Name = image, Value1 = C01, Value2 = C011 } }
{ Row = 1, Diff = { Name = name, Value1 = ASP.NET, Value2 = ASP.NET 2.0 } }
{ Row = 3, Diff = { Name = id, Value1 = 20507, Value2 = 20508 } }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM