简体   繁体   中英

Extracting XML data, modifying it and storing in excel file

I am new to asp.net. I have an xml file as follows:

<?xml version="1.0" encoding="iso-8859-1" ?>
<newsitem itemid="10000" id="root" date="1996-08-22" xml:lang="en">
  <title>CHINA: China says hopeful on global nuclear test ban.</title>
  <headline>China says hopeful on global nuclear test ban.</headline>
  <dateline>BEIJING 1996-08-22</dateline>
  <text>
    <p>China said on Thursday it was hopeful a global nuclear test ban treaty could be approved by the U.N. </p>
    <p>&quot;China hopes that the treaty could be open for signature by the end of the year and that there .</p>
    </text>
.....continue

The xml file is huge, I want that..i have to Process only terms in the ‹title› and ‹text› fields of each news item. Also, I have to count the frequency of those words.

I tried to extract the text from title and text field. I got data for title field but not getting for text field. Moreover, in the title field, I am not getting unique elements, the elements are getting repeated. Please help me.

The code I tried is :

 string filename = Server.MapPath("demo1.xml");
        XmlTextReader reader = new XmlTextReader(filename);
        XmlNodeType type;

        while (reader.Read())
        {
            type = reader.NodeType;

            if (type == XmlNodeType.Element)
            {
                if (reader.Name == "text")
                {
                    reader.Read();
                    TextBox1.Text = reader.Value;
                }

              if (reader.Name == "title")
                {
                    reader.Read();
                    ListBox1.Items.Add(reader.Value);

                }

            }
        }
        reader.Close();
    }

In the listbox, I am getting data but in text box i am not getting data. Moreover, i need to store huge xml data and count the the number of each words. for example china-2, says-1 and store it in excel. Would you tell me should i use string builder and if yes, how ?

This should get you started:

var xml = XElement.Load(new FileStream(@"C:\TEMP\TEST.xml", FileMode.Open));

var titleElement = xml.Elements("title").SingleOrDefault();
var title = titleElement != null ? titleElement.Value : String.Empty;
var textElement = xml.Elements("text").SingleOrDefault();
var text = textElement != null
               ? String.Join(String.Empty, textElement.Elements()
                                                      .Select(t => t.Value))
               : String.Empty;

I am using your above XML snippet as an example. You'll want to adapt it to your final XML structure, but I think with the above pattern you should be able to make it suit your needs.

The variable title will be the text of the <title> element and the variable text will be the concatenated text of all elements found within the <text> element. In this way you end up with String variables which you can perform standard text processing on to achieve your goal of getting word counts, etc.

Hope this helps!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM