简体   繁体   中英

How can I populate C# classes from an XML document that has some embedded data?

I have an API that has returned this:

http://services.aonaware.com/DictService/DictService.asmx?op=DefineInDict

<?xml version="1.0" encoding="utf-8"?>
<WordDefinition xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://services.aonaware.com/webservices/">
  <Word>abandon</Word>
  <Definitions>
    <Definition>
      <Word>abandon</Word>
      <Dictionary>
        <Id>wn</Id>
        <Name>WordNet (r) 2.0</Name>
      </Dictionary>
      <WordDefinition>abandon
     n 1: the trait of lacking restraint or control; freedom from
          inhibition or worry; "she danced with abandon" [syn: {wantonness},
           {unconstraint}]
     2: a feeling of extreme emotional intensity; "the wildness of
        his anger" [syn: {wildness}]
     v 1: forsake, leave behind; "We abandoned the old car in the
          empty parking lot"
     2: stop maintaining or insisting on; of ideas, claims, etc.;
        "He abandoned the thought of asking for her hand in
        marriage"; "Both sides have to give up some calims in
        these negociations" [syn: {give up}]
     3: give up with the intent of never claiming again; "Abandon
        your life to God"; "She gave up her children to her
        ex-husband when she moved to Tahiti"; "We gave the
        drowning victim up for dead" [syn: {give up}]
     4: leave behind empty; move out of; "You must vacate your
        office by tonight" [syn: {vacate}, {empty}]
     5: leave someone who needs or counts on you; leave in the
        lurch; "The mother deserted her children" [syn: {forsake},
         {desolate}, {desert}]
</WordDefinition>
    </Definition>
  </Definitions>
</WordDefinition>

Here is the code that I used to retrieve the XML data:

        WebRequest request = WebRequest.Create("http://services.aonaware.com/DictService/DictService.asmx/DefineInDict");
        request.Method = "POST";
        string postData = "dictId=wn&word=abandon";
        byte[] byteArray = Encoding.UTF8.GetBytes(postData);
        request.ContentType = "application/x-www-form-urlencoded";
        request.ContentLength = byteArray.Length;
        Stream dataStream = request.GetRequestStream();
        dataStream.Write(byteArray, 0, byteArray.Length);
        dataStream.Close();
        WebResponse response = request.GetResponse();
        Console.WriteLine(((HttpWebResponse)response).StatusDescription);
        dataStream = response.GetResponseStream();
        StreamReader reader = new StreamReader(dataStream);
        string responseFromServer = reader.ReadToEnd();
        Console.WriteLine(responseFromServer);
        reader.Close();
        dataStream.Close();
        response.Close();

I would like to extract the data from the XML into a List where the Definition class looks like:

public class Def
{
    public string text { get; set; }
    public List<string> synonym { get; set; }
}

public class Definition
{
    public string type { get; set; } // single character: n or v or a 
    public List<Def> Def { get; set; }
}

Can someone give me some advice on how I can do this and show what options are available to me to pick the class elements out of XML and put these into classes .

As I think this question could be helpful to many other people I'll open a large bounty so hopefully someone can take the time to come up with a good example

Update:

Sorry. I made a mistake with Synonym. I have changed this now. Hope it makes more sense. The synonyms are just a List I also put in bold what I am needing as the two answers so far don't seem to answer the question at all. Thank you.

I created a simple parser for the word definition (pretty sure there's room for improvements here):

Solution 1.0

class ParseyMcParseface
{
    /// <summary>
    /// Word definition lines
    /// </summary>
    private string[] _text;

    /// <summary>
    /// Constructor (Takes the innerText of the WordDefinition tag as input
    /// </summary>
    /// <param name="text">innerText of the WordDefinition</param>
    public ParseyMcParseface(string text)
    {
        _text = text.Split(new [] {'\n'}, StringSplitOptions.RemoveEmptyEntries)
            .Skip(1) // Skip the first line where the word is mentioned
            .ToArray();
    }

    /// <summary>
    /// Convert from single letter type to full human readable type
    /// </summary>
    /// <param name="c"></param>
    /// <returns></returns>
    private string CharToType(char c)
    {
        switch (c)
        {
            case 'a':
                return "Adjective";
            case 'n':
                return "Noun";
            case 'v':
                return "Verb";
            default:
                return "Unknown";
        }
    }

    /// <summary>
    /// Reorganize the data for easier parsing
    /// </summary>
    /// <param name="text">Lines of text</param>
    /// <returns></returns>
    private static List<List<string>> MakeLists(IEnumerable<string> text)
    {
        List<List<string>> types = new List<List<string>>();
        int i = -1;
        int j = 0;
        foreach (var line in text)
        {
            // New type (Noun, Verb, Adj.)
            if (Regex.IsMatch(line.Trim(), "^[avn]{1}\\ \\d+"))
            {
                types.Add(new List<string> { line.Trim() });
                i++;
                j = 0;
            }
            // New definition in the previous type
            else if (Regex.IsMatch(line.Trim(), "^\\d+"))
            {
                j++;
                types[i].Add(line.Trim());
            }
            // New line of the same definition
            else
            {
                types[i][j] = types[i][j] + " " + line.Trim();
            }
        }

        return types;
    }

    public List<Definition> Parse()
    {
        var definitionsLines = MakeLists(_text);

        List<Definition> definitions = new List<Definition>();

        foreach (var type in definitionsLines)
        {

            var defs = new List<Def>();
            foreach (var def in type)
            {
                var match = Regex.Match(def.Trim(), "(?:\\:\\ )(\\w|\\ |;|\"|,|\\.|-)*[\\[]{0,1}");
                MatchCollection syns = Regex.Matches(def.Trim(), "\\{(\\w|\\ )+\\}");

                List<string> synonymes = new List<string>();
                foreach (Match syn in syns)
                {
                    synonymes.Add(syn.Value.Trim('{', '}'));
                }

                defs.Add(new Def()
                {
                    text = match.Value.Trim(':', '[', ' '),
                    synonym = synonymes
                });
            }


            definitions.Add(new Definition
            {
                type = CharToType(type[0][0]),
                Def = defs
            });
        }
        return definitions;
    }
}

And here's a usage example:

WebRequest request = 
WebRequest.Create("http://services.aonaware.com/DictService/DictService.asmx/DefineInDict");
request.Method = "POST";
string postData = "dictId=wn&word=abandon";
byte[] byteArray = Encoding.UTF8.GetBytes(postData);
request.ContentType = "application/x-www-form-urlencoded";
request.ContentLength = byteArray.Length;
Stream dataStream = request.GetRequestStream();
dataStream.Write(byteArray, 0, byteArray.Length);
dataStream.Close();
WebResponse response = request.GetResponse();
Console.WriteLine(((HttpWebResponse)response).StatusDescription);
dataStream = response.GetResponseStream();
StreamReader reader = new StreamReader(dataStream);
string responseFromServer = reader.ReadToEnd();


var doc = new XmlDocument();
doc.LoadXml(responseFromServer );
var el = doc.GetElementsByTagName("WordDefinition");

ParseyMcParseface parseyMcParseface = new ParseyMcParseface(el[1].InnerText);
var parsingResult = parseyMcParseface.Parse();
// parsingResult will contain a list of Definitions
// per the format specified in the question.

And here's a live demo: https://dotnetfiddle.net/24IQ67

You can also avoid manually retrieving then parsing the XML by adding a reference to that webservice.

Solution 2.0

I've made a little app that does that then parses the definition. It is hosted here on GitHub (it's too big to post here on StackOverflow):

public enum WordTypes
{
    Noun,
    Verb,
    Adjective,
    Adverb,
    Unknown
}

public class Definition
{
    public Definition()
    {
        Synonyms = new List<string>();
        Anotnyms = new List<string>();
    }
    public WordTypes WordType { get; set; }
    public string DefinitionText { get; set; }
    public List<string> Synonyms { get; set; }
    public List<string> Anotnyms { get; set; }

}

static class DefinitionParser
{
    public static List<Definition> Parse(string wordDefinition)
    {
        var wordDefinitionLines = wordDefinition.Split(new[] { '\n' }, StringSplitOptions.RemoveEmptyEntries)
            .Skip(1)
            .Select(x => x.Trim())
            .ToList();

        var flatenedList = MakeLists(wordDefinitionLines).SelectMany(x => x).ToList();

        var result = new List<Definition>();
        foreach (var wd in flatenedList)
        {
            var foundMatch = Regex.Match(wd, @"^(?<matchType>adv|adj|v|n){0,1}\s*(\d*): (?<definition>[\w\s;""',\.\(\)\!\-]+)(?<extraInfoSyns>\[syn: ((?<wordSyn>\{[\w\s\-]+\})|(?:[,\ ]))*\]){0,1}\s*(?<extraInfoAnts>\[ant: ((?<wordAnt>\{[\w\s-]+\})|(?:[,\ ]))*\]){0,1}");

            var def = new Definition();

            if (foundMatch.Groups["matchType"].Success)
            {
                var matchType = foundMatch.Groups["matchType"];
                def.WordType = DefinitionTypeToEnum(matchType.Value);
            }

            if (foundMatch.Groups["definition"].Success)
            {
                var definition = foundMatch.Groups["definition"];
                def.DefinitionText = definition.Value;
            }

            if (foundMatch.Groups["extraInfoSyns"].Success && foundMatch.Groups["wordSyn"].Success)
            {
                foreach (Capture capture in foundMatch.Groups["wordSyn"].Captures)
                {
                    def.Synonyms.Add(capture.Value.Trim('{','}'));
                }
            }

            if (foundMatch.Groups["extraInfoAnts"].Success && foundMatch.Groups["wordAnt"].Success)
            {
                foreach (Capture capture in foundMatch.Groups["wordAnt"].Captures)
                {
                    def.Anotnyms.Add(capture.Value.Trim('{', '}'));
                }
            }

            result.Add(def);
        }
        return result;
    }

    private static List<List<string>> MakeLists(IEnumerable<string> text)
    {
        List<List<string>> types = new List<List<string>>();
        int i = -1;
        int j = 0;
        foreach (var line in text)
        {
            // New type (Noun, Verb, Adj.)
            if (Regex.IsMatch(line, "^(adj|v|n|adv){1}\\s\\d*"))
            {
                types.Add(new List<string> { line });
                i++;
                j = 0;
            }
            // New definition in the previous type
            else if (Regex.IsMatch(line, "^\\d+"))
            {
                j++;
                types[i].Add(line);
            }
            // New line of the same definition
            else
            {
                types[i][j] = types[i][j] + " " + line;
            }
        }

        return types;
    }

    private static WordTypes DefinitionTypeToEnum(string input)
    {
        switch (input)
        {
            case "adj":
                return WordTypes.Adjective;
            case "adv":
                return WordTypes.Adverb;
            case "n":
                return WordTypes.Noun;
            case "v":
                return WordTypes.Verb;
            default:
                return WordTypes.Unknown;
        }
    }
}

在此输入图像描述

Notes:

  • This should work as expected
  • Parsing free text is not reliable
  • You should import the service reference (as noted in the other answer) instead of parsing the XML manually.

Alexander Petrov's answer would be perfect for you except that you're dealing with a wonky xml schema. If WordNet is a real outfit, they should rework the schema to remove the nested WordDefinition elements and add new elements for the essential definition parts.

This quick solution will work for the specific test case you have provided, but it relies on many assumptions about the nature of the text. It also uses string manipulation and regular expressions which are considered inefficient, so may possibly be too slow and error-prone for your requirements. You may receive better solutions for this task, if you tailor your question to the string manipulation problem domain. But the correct solution is to get a better xml schema.

using System;
using System.Collections.Generic;
using System.IO;
using System.Text.RegularExpressions;
using System.Xml;

namespace DefinitionTest
{
    class Program
    {
        static void Main(string[] args)
        {
            List<Definition> definitions = new List<Definition>();

            // The starting point after your web service call.
            string responseFromServer = EmulateWebService();

            // Load the string into this object in order to parse the xml.
            XmlDocument doc = new XmlDocument();
            doc.LoadXml(responseFromServer);

            XmlNode root = doc.DocumentElement.ParentNode;

            XmlNodeList elemList = doc.GetElementsByTagName("WordDefinition");
            for (int i = 0; i < elemList.Count; i++)
            {
                XmlNode def = elemList[i];

                // We only want WordDefinition elements that have just one child which is the content we need.
                // Any WordDefinition that has zero children or more than one child is either empty or a parent element.
                if (def.ChildNodes.Count == 1)
                {
                    Console.WriteLine(string.Format("Content of WordDefinition {0}", i));
                    Console.WriteLine();
                    Console.WriteLine(def.InnerXml);
                    Console.WriteLine();

                    definitions.Add(ParseWordDefinition(def.InnerXml));

                    foreach (Definition dd in definitions)
                    {
                        Console.WriteLine(string.Format("Parsed Word Definition for \"{0}\"", dd.wordDefined));
                        Console.WriteLine();
                        foreach (Def d in dd.Defs)
                        {
                            string type = string.Empty;
                            switch (d.type)
                            {
                                case "a":
                                    type = "Adjective";
                                    break;
                                case "n":
                                    type = "Noun";
                                    break;
                                case "v":
                                    type = "Verb";
                                    break;
                                default:
                                    type = "";
                                    break;
                            }
                            Console.WriteLine(string.Format("Type \"{0}\"", type));
                            Console.WriteLine();
                            Console.WriteLine(string.Format("\tDefinition \"{0}\"", d.text));
                            Console.WriteLine();
                            if (d.Synonym != null && d.Synonym.Count > 0)
                            {
                                Console.WriteLine("\tSynonyms");
                                foreach (string syn in d.Synonym)
                                    Console.WriteLine("\t\t" + syn);
                            }
                        }
                    }
                }
            }
        }

        static string EmulateWebService()
        {
            string result = string.Empty;

            // The "definition.xml"file is a copy of the test data you provided.
            using (StreamReader reader = new StreamReader(@"c:\projects\definitiontest\definitiontest\definition.xml"))
            {
                result = reader.ReadToEnd();
            }
            return result;
        }

        static Definition ParseWordDefinition(string xmlDef)
        {
            // Replace any carriage return/line feed characters with spaces.
            string oneLine = xmlDef.Replace(System.Environment.NewLine, " ");

            // Squeeze internal white space.
            string squeezedLine = Regex.Replace(oneLine, @"\s{2,}", " ");

            // Assumption 1: The first word in the string is always the word being defined.
            string[] wordAndDefs = squeezedLine.Split(new char[] { ' ' }, StringSplitOptions.None);
            string wordDefined = wordAndDefs[0];
            string allDefinitions = string.Join(" ", wordAndDefs, 1, wordAndDefs.Length - 1);

            Definition parsedDefinition = new Definition();
            parsedDefinition.wordDefined = wordDefined;
            parsedDefinition.Defs = new List<Def>();

            string type = string.Empty;

            // Assumption 2: All definitions are delimited by a type letter, a number and a ':' character.
            string[] subDefinitions = Regex.Split(allDefinitions, @"(n|v|a){0,1}\s\d{1,}:");
            foreach (string definitionPart in subDefinitions)
            {
                if (string.IsNullOrEmpty(definitionPart))
                    continue;

                if (definitionPart == "n" || definitionPart == "v" || definitionPart == "a")
                {
                    type = definitionPart;
                }
                else
                {
                    Def def = new Def();
                    def.type = type;

                    // Assumption 3. Synonyms always use the [syn: {..},... ] pattern.
                    string realDef = (Regex.Split(definitionPart, @"\[\s*syn:"))[0];
                    def.text = realDef;

                    MatchCollection syns = Regex.Matches(definitionPart, @"\{([a-zA-Z\s]{1,})\}");
                    if (syns.Count > 0)
                        def.Synonym = new List<string>();

                    foreach (Match match in syns)
                    {
                        string s = match.Groups[0].Value;
                        // A little problem with regex retaining braces, so
                        // remove them here.
                        def.Synonym.Add(s.Replace('{', ' ').Replace('}', ' ').Trim());
                        int y = 0;
                    }
                    parsedDefinition.Defs.Add(def);
                }
            }
            return parsedDefinition;
        }
    }

    public class Def
    {
        // Moved your type from Definition to Def, since it made more sense to me.
        public string type { get; set; } // single character: n or v or a 
        public string text { get; set; }
        // Changed your synonym definition here.
        public List<string> Synonym { get; set; }
    }

    public class Definition
    {
        public string wordDefined { get; set; }
        // Changed Def to Defs.
        public List<Def> Defs { get; set; }
    }
}

Why handmade? Let's do everything automatically, because we're programmers!

Right click mouse on project, choose Add Service Reference .
Put http://services.aonaware.com/DictService/DictService.asmx into Address field.
Set desired Namespace.
You can also specify additional settings by clicking the Advanced button.
Click Ok button.

Will be generate a set of classes for work with the service.
Then just use these classes.

Please note that in App.config or Web.config of your application appears the settings needed to use the service. Next we use them.

An example of using these classes (don't forget to specify namespace to use):

var client = new DictServiceSoapClient("DictServiceSoap");
var wordDefinition = client.DefineInDict("wn", "abandon");

That's all!

In the DictServiceSoapClient constructor we specify the name from config used for the binding.

In wordDefinition we have a request result. Let's get information from it:

Console.WriteLine(wordDefinition.Word);
Console.WriteLine();

foreach (var definition in wordDefinition.Definitions)
{
    Console.WriteLine("Word: " + definition.Word);
    Console.WriteLine("Word Definition: " + definition.WordDefinition);

    Console.WriteLine("Id: " + definition.Dictionary.Id);
    Console.WriteLine("Name: " + definition.Dictionary.Name);
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM