简体   繁体   中英

C# How do I read Excel cell XML values with character level formatting? (Multiple <t> elements within <si>)

I am working on a small application that will read the contents of an excel worksheet and import all the data as strings into a windows form datagridview.

I have successfully implemented Dietmar Schoder's code example to do this. Special thanks to him for posting it.

http://www.codeproject.com/Tips/801032/Csharp-How-To-Read-xlsx-Excel-File-With-Lines-of?fid=1864959&df=10000&mpp=50&sort=Position&spc=Relaxed&tid=5149734

I am new to XML and have been stuck on this problem for a while now.

The cells with "character level" formatting contain two or more separate <t> text values within the <si> xml element.

Here is a snippet from the excel file's sharedstrings.xml file

<si>
    <r>
      <rPr>
        <b/>
        <sz val="12"/>
        <color rgb="FFFF0000"/>
        <rFont val="Arial"/>
        <family val="2"/>
      </rPr>
      <t>Text A</t>
    </r>
    <r>
      <rPr>
        <b/>
        <sz val="12"/>
        <color theme="1"/>
        <rFont val="Arial"/>
        <family val="2"/>
      </rPr>
      <t xml:space="preserve"> Text B</t>
    </r>
  </si>

This cell contains the text "Text A Text B" but returns null because the cell has character level formatting and therefore two <t> tags. "Text A" has strike-through, colored differently or bold etc and "Text B" doesn't.

The text values are assigned with the following line of code.

Text = Workbook.SharedStrings.si[Convert.ToInt32(_value)].t;

Is there anyway to concatenate the strings from both <t> elements before assigning it to the Text variable?

Edit: I think I have no narrowed the problem down to the sharedstrings.cs file and the deserialization of the sharedstrings.xml

SharedStrings = DeserializedZipEntry<sst>(GetZipArchiveEntry(zipArchive, @"xl/sharedStrings.xml"));

sst class:

[Serializable()]
[XmlType(Namespace = "http://schemas.openxmlformats.org/spreadsheetml/2006/main")]
[XmlRoot("sst", Namespace = "http://schemas.openxmlformats.org/spreadsheetml/2006/main")]

public class sst
{
    [XmlAttribute]
    public string uniqueCount;

    [XmlAttribute]
    public string count;

    [XmlElement("si")]
    public SharedString[] si;

    public sst() { }
}

public class SharedString
{
    public string t;        
}

I have been unable to edit this class in a way that will correctly interpret both t elements text values.

Solved this one myself after studying more about xml serialization and many other similar questions to this one on here.

sst class:

public class sst
{
    [XmlAttribute]
    public string uniqueCount;

    [XmlAttribute]
    public string count;

    [XmlElement("si")]
    public SharedString[] si;



    public sst() { }
}

public class SharedString
{
    public string t;

    [XmlElement("r")]
    public NestedString[] ns;

    public SharedString() { }

}

public class NestedString
{
    public string t;
}

And the assignment of cell's text:

if (Workbook.SharedStrings.si[Convert.ToInt32(_value)].t != null)
{
   Text = Workbook.SharedStrings.si[Convert.ToInt32(_value)].t;
}
else if (Workbook.SharedStrings.si[Convert.ToInt32(_value)].ns != null)
{
   for (int i = 0; i < Workbook.SharedStrings.si[Convert.ToInt32(_value)].ns.Length; i++)
   {
     Text += Workbook.SharedStrings.si[Convert.ToInt32(_value)].ns[i].t;
   }
} 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM