简体   繁体   English

如何使用OpenXML在单词.docx中找到制表符?

[英]How do I find tab chars in a word .docx using OpenXML?

Background: I have a word .docx document that contains tab chars. 背景:我有一个包含制表符的.docx文档。 I want to read each paragraph and substitute a single space. 我想阅读每个段落并用一个空格替代。 I need the space as a delimiter so I can parse out things like dates, names, etc. 我需要用空格作为分隔符,以便可以解析出日期,名称等内容。

Problem: Using paragraph.InnerText doesn't return the tab chars since they are a separate xml element. 问题:使用段落.InnerText不会返回制表符,因为它们是单独的xml元素。 If I manually substitute spaces, my parsing routines work fine. 如果手动替换空格,则解析例程可以正常工作。 However, using paragraph.InnerText the text returned is all scrunched up. 但是,使用paragraph.InnerText将返回的文本全部压缩。

I have not been able to get the tab chars using run.InnerText either. 我也无法使用run.InnerText获取制表符。 I have searched for examples but have found none that solve the problem. 我已经搜索了示例,但没有找到可以解决问题的示例。

        using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(filePath, false))
        {
            Body body = wordDocument.MainDocumentPart.Document.Body;

            foreach (var para in body.Elements<Paragraph>())
            {
                s = para.InnerText.ToString();      // Tab chars are stripped
                Console.WriteLine("Run: " + s);
            }
        }

        using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(filePath, false))
            {
                Body body = wordDocument.MainDocumentPart.Document.Body;

                foreach (var para in body.Elements<Paragraph>())
                {
                    s = "";                 // Work string to build full line

                    foreach (var run in para.Elements<Run>())
                    {
                    //  If (This is a tab char)
                    //  { 
                    //     s = s + " ";     // Yes - Substitute a space 
                    //  }
                    //  else    // No - This assumes there are no other xml tags like "Proof Error"
                    //  {
                    //      s = s + run.InnerText.ToString();
                    //  }
                    }
                    Console.WriteLine("Run: " + s);
                }

Solved: I am able to find the tab chars and substitute spaces. 解决:我能够找到制表符和替代空格。 Closing. 闭幕。

I am using the .LocalName property of the run element. 我正在使用run元素的.LocalName属性。 I can test for "tab". 我可以测试“标签”。

                    foreach (var e in run.Elements())
                    {
                        if (e.LocalName == "tab")
                        {
                            Console.WriteLine("    Element Tab: " + e.InnerText.ToString());
                            s = s + " ";
                        }
                        else if (e.LocalName == "t")
                        {
                            Console.WriteLine("    Element Text: " + e.InnerText.ToString());
                            s = s + e.InnerText.ToString();
                        }
                        else
                        {
                            Console.WriteLine("Drop Through RUN set: " + e.LocalName);
                        }
                    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 OpenXML 和 Regex 在 Word Docx 中查找和替换撇号(&#39;)的问题 - Issue with find and replace apostrophe( ' ) in a Word Docx using OpenXML and Regex 如何在.docx文件中找到纯文本内容控件,并使用OpenXML SDK替换其中的文本? - How do I find plain-text content controls within a .docx file and replace the text within using the OpenXML SDK? 使用OpenXML将Word docx转换为Excel - Convert Word docx to Excel using OpenXML 如何使用OpenXML查找单词超链接 - How to find word hyperlink with OpenXML 我可以使用OpenXML SDK在Word文档中找到文本的位置吗? - Can I find the location of text in a Word document using OpenXML SDK? 是否可以使用OpenXml将RTF文本片段插入Word文档(.docx)? - Is it possible to insert pieces of RTF text into a Word document (.docx) using OpenXml? 使用OpenXml创建Word文档(docx) - Creating a word document (docx) with OpenXml 如何使用带有c#的OpenXML Format SDK从具有格式的单词中读取数据? - How do I read data from a word with format using the OpenXML Format SDK with c#? 如何使用OpenXml将外部图像添加到Word文档? - How can I add an external image to a word document using OpenXml? 我将如何使用DocumentFormat.OpenXml从docx文件中提取数据-详情如下 - How will i extract the data from the docx file using DocumentFormat.OpenXml -details below
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM