简体   繁体   English

从API提取XML并将其存储在本地的最有效方法是什么?

[英]What is the most efficient way to take XML from API and store it locally?

I am trying to find the fastest way to read XML from the merriam webster dictionary, and store it to a local file for later use. 我正在尝试找到从merriam webster字典中读取XML并将其存储到本地文件以供以后使用的最快方法。 Below, I try to implement a module which does a few things: 下面,我尝试实现一个执行以下操作的模块:

  1. Read 2000 words from a local directory 从本地目录读取2000个单词
  2. Look up each of the words in the merriam dictionary using the API 使用API​​查找merriam词典中的每个单词
  3. Store the definition(s) in a local XML for later use. 将定义存储在本地XML中供以后使用。

Im not sure if making an XML is the best way to store this data, but it seemed like the simplest thing to do. 我不确定制作XML是否是存储此数据的最佳方法,但这似乎是最简单的事情。 At first, I thought I would do it in different steps. 起初,我以为我会以不同的步骤来做。 (1. Look up word, store word and definitions into data structure. 2. Dump all data into XML.) However, this poses a problem, because it just too much stuff to store on the runtime(call) stack. (1.查找单词,将单词和定义存储到数据结构中。2.将所有数据转储到XML中。)但是,这带来了一个问题,因为它太多的东西无法存储在运行时(调用)堆栈中。

So, in this scenario, I try to speed things up by looking up each word and then saving it to the xml one by one. 因此,在这种情况下,我尝试通过查找每个单词然后将其逐个保存到xml来加快处理速度。 This, however, is also a slow method. 但是,这也是一种缓慢的方法。 Its taking me up around 10 minutes per 500-600 words. 每500-600个字使我花费大约10分钟。

public void load_module() // stores words/definitions into xml file
    { // 1. Pick up word from text file     2. Look up word's definition    3. Store in Xml 
        string workdirect = Directory.GetCurrentDirectory();
        workdirect = workdirect.Substring(0, workdirect.LastIndexOf("bin"));
        workdirect += "words1.txt";
        using (StreamReader read = new StreamReader(workdirect)) // 1. Pick up word from text file 
        {
            while (!read.EndOfStream)
            {
                string line = read.ReadLine(); 
                var definitions = load(line.ToLower());    // 2. Retrieve Words Definitions

                store_xml(line, definitions);
                wordlist.Add(line);
            }
        }
    }

    public List<string> load(string word)
    {
        XmlDocument doc = new XmlDocument();

        List<string> definitions = new List<string>();
        XmlNodeList node = null;

        doc.Load("http://www.dictionaryapi.com/api/v1/references/collegiate/xml/"+word+"?key=*****************"); // Asteriks to hide the actual API key

        if (doc.SelectSingleNode("entry_list").SelectSingleNode("entry").SelectSingleNode("def") == null)
        {
            return definitions;
        }
        node = doc.SelectSingleNode("entry_list").SelectSingleNode("entry").SelectSingleNode("def").SelectNodes("dt");

        // TO DO : implement definitions if there is no node "def" in first node entry "entry_list"

        foreach (XmlNode item in node)
        {
            definitions.Add(item.InnerXml.ToString().ToLower());
        }


        return definitions;

    }

    public void store_xml(string word, List<string> definitions)
    {
        string local = Directory.GetCurrentDirectory();
        string name = "dictionary_word.xml";
        local = local.Substring(0, local.LastIndexOf("bin"));
        bool exists = File.Exists(local + name);

        if (exists)
        {
            XmlDocument doc = new XmlDocument();
            doc.Load(local + name);
            XmlElement wordindoc = doc.CreateElement("Word");
            wordindoc.SetAttribute("xmlns", word);
            XmlElement defs = doc.CreateElement("Definitions");
            foreach (var item in definitions)
            {
                XmlElement def = doc.CreateElement("Definition");
                def.InnerText = item;
                defs.AppendChild(def);
            }
            wordindoc.AppendChild(defs);
            doc.DocumentElement.AppendChild(wordindoc);
            doc.Save(local+name);
        }
        else
        {
            using (XmlWriter writer = XmlWriter.Create(@local + name))
            {
                writer.WriteStartDocument();

                writer.WriteStartElement("Dictionary");

                writer.WriteStartElement("Word", word);

                writer.WriteStartElement("Definitions");
                foreach (var def in definitions)
                {
                    writer.WriteElementString("Definition", def);
                }
                writer.WriteEndElement();
                writer.WriteEndElement();

                writer.WriteEndElement();
                writer.WriteEndDocument();
            }
        }           
    }
}

When handling large amounts of data that need to be exported to XML, I would normally keep the data in memory as a collection of custom objects rather than as an XMLDocument: 当处理大量需要导出到XML的数据时,我通常将数据作为自定义对象的集合而不是XMLDocument保留在内存中:

public class Definition
{
    public string Word { get; set; }
    public string Definition { get; set; }
}

I would then use XMLWriter to write the collection to the XML file: 然后,我将使用XMLWriter将集合写入XML文件:

XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.IndentChars = ("    ");
settings.Encoding = Encoding.UTF8;
using (XmlWriter writer = XmlWriter.Create("C:\output\output.xml", settings))
{
    writer.WriteStartDocument();
    // TODO - use XMLWriter functions to write out each word and definition
    writer.Flush();
}

If you are still short on memory, you might be able to write out the XML in batches (eg every 500 definitions). 如果仍然缺少内存,则可以批量写出XML(例如,每500个定义)。

I found the Microsoft article on Improving XML Performance a very useful reference, particularly the section on Design Considerations. 我发现有关改善XML性能的Microsoft文章是非常有用的参考,特别是有关“设计注意事项”的部分。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 反序列化XML文件的最有效方法是什么 - What is the most efficient way to Deserialze an XML file 克隆 Office Open XML 文档的最有效方法是什么? - What is the most efficient way to clone Office Open XML documents? 使用 Smartsheet API 从每张工作表中的任何行获取所有附件的最有效方法是什么? - What's the most efficient way to get all attachments from any row in every sheet using Smartsheet API? 从多个XML文件读取两个节点的最有效方法? - Most efficient way to read two nodes from multiple XML files? 从Xml文件读取的最有效和缓存的方法 - The Most efficient and cached way to read from Xml files 将数据从数据库传输到服务器再传输到客户端的最有效方法是什么? - What is the most efficient way to transfert data from Database to server to client? 什么是从脚本中解析对象名称的最有效方法? - What can be the most efficient way to parse the object name from the script? 比较/排序两个阵列中的项目的最有效方法是什么? - What is the most efficient way to compare/sort items from two arrays? 从 KeyedCollection 获取键列表的最有效方法是什么? - What is the most efficient way to obtain the list of keys from a KeyedCollection? 处理从用户选择的多个条件的最有效方法是什么? - What is the most efficient way to handle multiple conditions that are picked from the user?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM