[英]What is the most efficient way to take XML from API and store it locally?
我正在嘗試找到從merriam webster字典中讀取XML並將其存儲到本地文件以供以后使用的最快方法。 下面,我嘗試實現一個執行以下操作的模塊:
我不確定制作XML是否是存儲此數據的最佳方法,但這似乎是最簡單的事情。 起初,我以為我會以不同的步驟來做。 (1.查找單詞,將單詞和定義存儲到數據結構中。2.將所有數據轉儲到XML中。)但是,這帶來了一個問題,因為它太多的東西無法存儲在運行時(調用)堆棧中。
因此,在這種情況下,我嘗試通過查找每個單詞然后將其逐個保存到xml來加快處理速度。 但是,這也是一種緩慢的方法。 每500-600個字使我花費大約10分鍾。
public void load_module() // stores words/definitions into xml file
{ // 1. Pick up word from text file 2. Look up word's definition 3. Store in Xml
string workdirect = Directory.GetCurrentDirectory();
workdirect = workdirect.Substring(0, workdirect.LastIndexOf("bin"));
workdirect += "words1.txt";
using (StreamReader read = new StreamReader(workdirect)) // 1. Pick up word from text file
{
while (!read.EndOfStream)
{
string line = read.ReadLine();
var definitions = load(line.ToLower()); // 2. Retrieve Words Definitions
store_xml(line, definitions);
wordlist.Add(line);
}
}
}
public List<string> load(string word)
{
XmlDocument doc = new XmlDocument();
List<string> definitions = new List<string>();
XmlNodeList node = null;
doc.Load("http://www.dictionaryapi.com/api/v1/references/collegiate/xml/"+word+"?key=*****************"); // Asteriks to hide the actual API key
if (doc.SelectSingleNode("entry_list").SelectSingleNode("entry").SelectSingleNode("def") == null)
{
return definitions;
}
node = doc.SelectSingleNode("entry_list").SelectSingleNode("entry").SelectSingleNode("def").SelectNodes("dt");
// TO DO : implement definitions if there is no node "def" in first node entry "entry_list"
foreach (XmlNode item in node)
{
definitions.Add(item.InnerXml.ToString().ToLower());
}
return definitions;
}
public void store_xml(string word, List<string> definitions)
{
string local = Directory.GetCurrentDirectory();
string name = "dictionary_word.xml";
local = local.Substring(0, local.LastIndexOf("bin"));
bool exists = File.Exists(local + name);
if (exists)
{
XmlDocument doc = new XmlDocument();
doc.Load(local + name);
XmlElement wordindoc = doc.CreateElement("Word");
wordindoc.SetAttribute("xmlns", word);
XmlElement defs = doc.CreateElement("Definitions");
foreach (var item in definitions)
{
XmlElement def = doc.CreateElement("Definition");
def.InnerText = item;
defs.AppendChild(def);
}
wordindoc.AppendChild(defs);
doc.DocumentElement.AppendChild(wordindoc);
doc.Save(local+name);
}
else
{
using (XmlWriter writer = XmlWriter.Create(@local + name))
{
writer.WriteStartDocument();
writer.WriteStartElement("Dictionary");
writer.WriteStartElement("Word", word);
writer.WriteStartElement("Definitions");
foreach (var def in definitions)
{
writer.WriteElementString("Definition", def);
}
writer.WriteEndElement();
writer.WriteEndElement();
writer.WriteEndElement();
writer.WriteEndDocument();
}
}
}
}
當處理大量需要導出到XML的數據時,我通常將數據作為自定義對象的集合而不是XMLDocument保留在內存中:
public class Definition
{
public string Word { get; set; }
public string Definition { get; set; }
}
然后,我將使用XMLWriter將集合寫入XML文件:
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.IndentChars = (" ");
settings.Encoding = Encoding.UTF8;
using (XmlWriter writer = XmlWriter.Create("C:\output\output.xml", settings))
{
writer.WriteStartDocument();
// TODO - use XMLWriter functions to write out each word and definition
writer.Flush();
}
如果仍然缺少內存,則可以批量寫出XML(例如,每500個定義)。
我發現有關改善XML性能的Microsoft文章是非常有用的參考,特別是有關“設計注意事項”的部分。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.