简体   繁体   中英

How to deserialize only part of a large xml file to c# classes?

I've already read some posts and articles on how to deserialize xml but still haven't figured out the way I should write the code to match my needs, so.. I'm apologizing for another question about deserializing xml ))

I have a large (50 MB) xml file which I need to deserialize. I use xsd.exe to get xsd schema of the document and than autogenerate c# classes file which I put into my project. I want to get some (not all) data from this xml file and put it into my sql database.

Here is the hierarchy of the file (simplified, xsd is very large):

public class yml_catalog 
{
    public yml_catalogShop[] shop { /*realization*/ }
}

public class yml_catalogShop
{
    public yml_catalogShopOffersOffer[][] offers { /*realization*/ }
}

public class yml_catalogShopOffersOffer
{
    // here goes all the data (properties) I want to obtain ))
}

And here is my code:

yml_catalogShopOffersOffer catalog;
var serializer = new XmlSerializer(typeof(yml_catalogShopOffersOffer));
var reader = new StreamReader(@"C:\div_kid.xml");
catalog = (yml_catalogShopOffersOffer) serializer.Deserialize(reader);//exception occures
reader.Close();

I get InvalidOperationException: There is an error in the XML(3,2) document

XmlSerializer ser = new XmlSerializer(typeof(yml_catalogShopOffersOffer));
yml_catalogShopOffersOffer result;
using (XmlReader reader = XmlReader.Create(@"C:\div_kid.xml"))          
{
    result = (yml_catalogShopOffersOffer)ser.Deserialize(reader); // exception occures
}

InvalidOperationException: There is an error in the XML(0,0) document

I tried to deserialize the entire file: 我试图反序列化整个文件:

 XmlSerializer ser = new XmlSerializer(typeof(yml_catalog)); // exception occures
 yml_catalog result;
 using (XmlReader reader = XmlReader.Create(@"C:\div_kid.xml"))          
 {
     result = (yml_catalog)ser.Deserialize(reader);
 }

And I get the following:

error CS0030: The convertion of type "yml_catalogShopOffersOffer[]" into "yml_catalogShopOffersOffer" is not possible.

error CS0029: The implicit convertion of type "yml_catalogShopOffersOffer" into "yml_catalogShopOffersOffer[]" is not possible.

So, how to fix (or overwrite) the code to not get the exceptions?

Also when I write: 当我写:

XDocument doc = XDocument.Parse(@"C:\div_kid.xml");

The XmlException occures: unpermitted data on root level, string 1, position 1.

Here is the first string of the xml file:

<?xml version="1.0" encoding="windows-1251"?>

The xml file short example: xml文件简短示例:

<?xml version="1.0" encoding="windows-1251"?>
<!DOCTYPE yml_catalog SYSTEM "shops.dtd">
<yml_catalog date="2012-11-01 23:29">
<shop>
   <name>OZON.ru</name>
   <company>?????? "???????????????? ??????????????"</company>
   <url>http://www.ozon.ru/</url>
   <currencies>
     <currency id="RUR" rate="1" />
   </currencies>
   <categories>
      <category id=""1126233>base category</category>
      <category id="1127479" parentId="1126233">bla bla bla</category>
      // here goes all the categories
   </categories>
   <offers>
      <offer>
         <price></price>
         <picture></picture>
      </offer>
      // other offers
   </offers>
</shop>
</yml_catalog>

I've already acccepted the answer (it's perfect). 我已经接受了答案(这是完美的)。 But now I need to find "base category" for each Offer using categoryId. The data is hierarchical and the base category is the category that has no "parentId" attribute. So, I wrote a recursive method to find the "base category", but it never finishes. Seems like the algorythm is not very fast))
Here is my code: (in the main() method)

var doc = XDocument.Load(@"C:\div_kid.xml");
var offers = doc.Descendants("shop").Elements("offers").Elements("offer");
foreach (var offer in offers.Take(2))
        {
            var category = GetCategory(categoryId, doc);
            // here goes other code
        }

Helper method:

public static string GetCategory(int categoryId, XDocument document)
    {
        var tempId = categoryId;
            var categories = document.Descendants("shop").Elements("categories").Elements("category");
            foreach (var category in categories)
            {
                if (category.Attribute("id").ToString() == categoryId.ToString())
                {
                    if (category.Attributes().Count() == 1)
                    {
                        return category.ToString();
                    }
                    tempId = Convert.ToInt32(category.Attribute("parentId"));
                }
            }
        return GetCategory(tempId, document);
    }

Can I use recursion in such situation? If not, how else can I find the "base category"?

Give LINQ to XML a try. XElement result = XElement.Load(@"C:\\div_kid.xml");

Querying in LINQ is brilliant but admittedly a little weird at the start. You select nodes from the Document in a SQL like syntax, or using lambda expressions. Then create anonymous objects (or use existing classes) containing the data you are interested in.

Best is to see it in action.

Based on your sample XML and code, here's a specific example:

var element = XElement.Load(@"C:\div_kid.xml");
var shopsQuery =
    from shop in element.Descendants("shop")
    select new
    {
        Name = (string) shop.Descendants("name").FirstOrDefault(),
        Company = (string) shop.Descendants("company").FirstOrDefault(),
        Categories = 
            from category in shop.Descendants("category")
            select new {
                Id = category.Attribute("id").Value,
                Parent = category.Attribute("parentId").Value,
                Name = category.Value
            },
        Offers =
            from offer in shop.Descendants("offer")
            select new { 
                Price = (string) offer.Descendants("price").FirstOrDefault(),
                Picture = (string) offer.Descendants("picture").FirstOrDefault()
            }

    };

foreach (var shop in shopsQuery){
    Console.WriteLine(shop.Name);
    Console.WriteLine(shop.Company);
    foreach (var category in shop.Categories)
    {
        Console.WriteLine(category.Name);
        Console.WriteLine(category.Id);
    }
    foreach (var offer in shop.Offers)
    {
        Console.WriteLine(offer.Price);
        Console.WriteLine(offer.Picture);
    }
}  

As an extra: Here's how to deserialize the tree of categories from the flat category elements. You need a proper class to house them, for the list of Children must have a type:

class Category
{
    public int Id { get; set; }
    public int? ParentId { get; set; }
    public List<Category> Children { get; set; }
    public IEnumerable<Category> Descendants {
        get
        {
            return (from child in Children
                    select child.Descendants).SelectMany(x => x).
                    Concat(new Category[] { this });
        }
    }
}

To create a list containing all distinct categories in the document:

var categories = (from category in element.Descendants("category")
                    orderby int.Parse( category.Attribute("id").Value )
                    select new Category()
                    {
                        Id = int.Parse(category.Attribute("id").Value),
                        ParentId = category.Attribute("parentId") == null ?
                            null as int? : int.Parse(category.Attribute("parentId").Value),
                        Children = new List<Category>()
                    }).Distinct().ToList();

Then organize them into a tree (Heavily borrowed from flat list to hierarchy ):

var lookup = categories.ToLookup(cat => cat.ParentId);
foreach (var category in categories)
{
    category.Children = lookup[category.Id].ToList();
}
var rootCategories = lookup[null].ToList();

To find the root which contains theCategory :

var root = (from cat in rootCategories
            where cat.Descendants.Contains(theCategory)
            select cat).FirstOrDefault();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM