简体   繁体   中英

Read a docx file in C# using OpenXml

I am new to C# and OpenXml. I need help with reading a.docx file and storing each paragraph in the Array.

I am Using OpenXml to read a word(.docx) file. I was able to read the file and print it. But the problem is I was only able to print the concatenated paragraph. I couldn't find a way to store each paragraph as array of Strings(Like in Python using docx library you automatically store paragraph as a list of string, I was looking something similar to that).

using System;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
namespace ConsoleApp1
{
    class Program
    {

        static void Main(string[] args)
        {
            OpenWordprocessingDocumentReadonly(@"E:\WordDocTest\Test.docx");
        }
        public static void OpenWordprocessingDocumentReadonly(string filepath)
        {
            // Open a WordprocessingDocument based on a filepath.
            using (WordprocessingDocument wordDocument =
                WordprocessingDocument.Open(filepath, false))
            {
                // Assign a reference to the existing document body.  
                Body body = wordDocument.MainDocumentPart.Document.Body;
                Console.WriteLine(body.InnerText);
                wordDocument.Close();
             }
        }
     }
}

Test.docx Looks Like this

1. Test

This is Test 1.
Test1 part a.

2. noTest

This is Test2.

The Output that I got was: TestThis is Test 1.Test1 part a.noTestThis is Test 2.
What I want to learn is about the way to store each paragraph or line in an Array of String and be able to iterate through that array.

You can avoid using arrays and instead unleash the wonderful power of Openxml combined with Linq and Lists. If you want to work with paragraphs you could create a list lik this:

var paras = body.OfType<Paragraph>();

You can then expand on this to return specific elements using Where, for example:

var paras = body.OfType<Paragraph>()
.Where(p => p.ParagraphProperties != null &&                   
p.ParagraphProperties.ParagraphStyleId != null &&     
p.ParagraphProperties.ParagraphStyleId.Val.Value.Contains("Heading1")).ToList();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM