從doc / docx文件中提取單詞C＃

Question

我想從Word文件（doc / docx）中提取所有單詞，並將它們放入列表中。 好像microsoft.Office.Interop可以正常工作，只要我想提取段落並將它們添加到列表中即可。

List<string> data = new List<string>();

Microsoft.Office.Interop.Word.Application app = new 
  Microsoft.Office.Interop.Word.Application();

Document doc = app.Documents.Open(dlg.FileName);

foreach (Paragraph objParagraph in doc.Paragraphs)
  data.Add(objParagraph.Range.Text.Trim());

((_Document)doc).Close();
((_Application)app).Quit();`

我還找到了逐字提取的方法，但是由於循環會生成異常，因此它不適用於大型文檔。

`Dictionary<int, string> motRap = new Dictionary<int, string>();
        Microsoft.Office.Interop.Word.Application application = new Microsoft.Office.Interop.Word.Application();
        Document document = application.Documents.Open("C:/Users/Titri/Desktop/test/test/bin/Debug/po.txt");

    // Loop through all words in the document.
    int count = document.Words.Count;
    for (int i = 1; i <= count; i++)
    {
        string text = document.Words[i].Text;
        motRap.Add(i, text);

    }
    // Close word.
    application.Quit();`

所以我的問題是，是否有一種方法可以從大字文件中提取字。 我認為Microsoft.Office.Interop並不是從大文件中提取的好工具。 對不起，我的英語不好。

Answer 1

一段內的對象稱為Run ，盡管我不知道在Interop中是否可用。 為了明智地提高您的體驗，建議您切換到使用OpenXmlSdk ，以防您必須處理大量文檔。

如果您想堅持使用Interop，為什么不將每個段落分成一個數組（分隔符顯然是空格），然后在后面加上所有單詞呢？

從doc / docx文件中提取單詞C＃

問題描述

1 個解決方案

解決方案1
1 2017-06-22 12:35:30

從doc / docx文件中提取單詞C＃

問題描述

1 個解決方案

解決方案1 1 2017-06-22 12:35:30

解決方案1
1 2017-06-22 12:35:30