简体   繁体   中英

Extract words from a doc/docx file c#

I want to extract all the words from a Word file (doc/docx) and put them into a list. It seems like microsoft.Office.Interop works just if i want to extract paragraphs and add them into a list.

List<string> data = new List<string>();

Microsoft.Office.Interop.Word.Application app = new 
  Microsoft.Office.Interop.Word.Application();

Document doc = app.Documents.Open(dlg.FileName);

foreach (Paragraph objParagraph in doc.Paragraphs)
  data.Add(objParagraph.Range.Text.Trim());

((_Document)doc).Close();
((_Application)app).Quit();`

I also found the way to extract word by word but it didn't works with big document because of the loop that generates an exception.

`Dictionary<int, string> motRap = new Dictionary<int, string>();
        Microsoft.Office.Interop.Word.Application application = new Microsoft.Office.Interop.Word.Application();
        Document document = application.Documents.Open("C:/Users/Titri/Desktop/test/test/bin/Debug/po.txt");

    // Loop through all words in the document.
    int count = document.Words.Count;
    for (int i = 1; i <= count; i++)
    {
        string text = document.Words[i].Text;
        motRap.Add(i, text);

    }
    // Close word.
    application.Quit();`

So my question is, if there is a way to extract words from a big word file. I think that Microsoft.Office.Interop is not the good tool to extract from a big file. Sorry my english is not good.

The object inside a paragraph is called Run , though I don't know whether or not this is available in Interop. To enhance your experience performancewise, I would suggest you switch to using OpenXmlSdk , in case you have to process a large amount of documents.

If you want to stick to Interop, why don't you just split each paragraph into an array (delimiter obviously space) and add all the words after that?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM