繁体   English   中英

如何使用Microsoft.Office.Interop.Word在C#中从Word文件中查找突出显示的文本?

[英]How to find highlighted text from Word file in C# using Microsoft.Office.Interop.Word?

这个问题本来很简单,但是添加一个额外的条款对我来说是一个很大的问题。 这里的问题是我不需要Word文件中所有突出显示的“单词”和“短语”。 我写了以下代码:

using Word = Microsoft.Office.Interop.Word;

private void button1_Click(object sender, EventArgs e)
{
    try
    {
        Word.ApplicationClass wordObject = new Word.ApplicationClass();
        wordObject.Visible = false;
        object file = "D:\\mywordfile.docx";
        object nullobject = System.Reflection.Missing.Value;
        Word.Document thisDoc = wordObject.Documents.Open(ref file, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject);
        List<string> wordHighlights = new List<string>();

        //Let myRange be some Range which has my text under consideration

        int prevStart = 0;
        int prevEnd = 0;
        int thisStart = 0;
        int thisEnd = 0;
        string tempStr = "";
        foreach (Word.Range cellWordRange in myRange.Words)
        {
            if (cellWordRange.HighlightColorIndex.ToString() == "wdNoHighlight")
            {
                continue;
            }
            else
            {
                thisStart = cellWordRange.Start;
                thisEnd = cellWordRange.End;
                string cellWordText = cellWordRange.Text.Trim();
                if (cellWordText.Length >= 1)   // valid word length, non-whitespace
                {
                    if (thisStart == prevEnd)    // If this word is contiguously highlighted with previous highlighted word
                    {
                        tempStr = String.Concat(tempStr, " "+cellWordText);  // Concatenate with previous contiguously highlighted word
                    }
                    else
                    {
                        if (tempStr.Length > 0)    // If some string has been concatenated in previous iterations
                        {
                            wordHighlights.Add(tempStr);
                        }
                        tempStr = "";
                        tempStr = cellWordText;
                    }
                }
                prevStart = thisStart;
                prevEnd = thisEnd;
            }
        }

        foreach (string highlightedString in wordHighlights)
        {
            MessageBox.Show(highlightedString);
        }
    }
    catch (Exception j)
    {
        MessageBox.Show(j.Message);
    }
}

现在考虑以下文本:

Lethévertaunraôledansla diminutionducholestérol,la combustion des graisses,lapréventionduobbèteetles AVC,et conjurerlasémence。

现在假设有人突出显示“ ducholestérol ”,我的代码显然选择了两个单词“ du ”和“ cholestérol ”。 如何将连续突出显示的区域显示为单个单词? 我的意思是“ ducholestérol ”应该作为List一个实体返回。 我们用char扫描文档char的任何逻辑,将突出显示的起点标记为选择的起始点,并将突出显示的端点标记为选择的结束点?

PS:如果有一个具有任何其他语言所需功能的库,请告诉我,因为该场景不是特定于语言的。 我只需要以某种方式获得所需的结果。

编辑:根据Oliver Hanappi的建议,使用StartEnd修改代码。 但问题仍然存在,如果有两个这样突出显示的短语,只用空格分隔,程序会将两个短语视为一个。 仅仅因为它读取的是Words而不是空格。 if (thisStart == prevEnd)可能需要进行一些编辑吗?

使用Find可以更有效地执行此操作,它将更快地搜索并选择所有匹配的连续文本。 请参阅此处的参考资料http://msdn.microsoft.com/en-us/library/office/bb258967%28v=office.12%29.aspx

以下是VBA中打印所有突出显示文本的示例:

Sub TestFind()

  Dim myRange As Range

  Set myRange = ActiveDocument.Content    '    search entire document

  With myRange.Find

    .Highlight = True

    Do While .Execute = True     '   loop while highlighted text is found

      Debug.Print myRange.Text   '   myRange is changed to contain the found text

    Loop

  End With

End Sub

希望这有助于您更好地理解。

您可以查看范围的“ 开始”和“ 结束”属性,并检查第一个范围的结尾是否等于第二个范围的开头。

作为替代方案,您可以范围移动一个单词(请参阅WdUnits.wdWord),然后检查移动的开始和结束是否等于第二个单词的开头和结尾。

grahamj42答案还可以,我把它翻译成C#。 如果要在整个文档中查找匹配项,请使用:

Word.Range content = thisDoc.Content

但请记住,这只是mainStoryRange,如果你想匹配单词,例如你需要使用的脚注:

Word.StoryRanges stories = null;
stories = thisDoc.StoryRanges;
Word.Range footnoteRange = stories[Word.WdStoryType.wdFootnotesStory]

我的代码:

Word.Find find = null;
Word.Range duplicate = null;
try
{
    duplicate = range.Duplicate;
    find = duplicate.Find;
    find.Highlight = 1;

    object str = "";
    object missing = System.Type.Missing;
    object objTrue = true;
    object replace = Word.WdReplace.wdReplaceNone;

    bool result = find.Execute(ref str, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref objTrue, ref str, ref replace, ref missing, ref missing, ref missing, ref missing);
    while (result)
    {
        // code to store range text
        // use duplicate.Text property
        result = find.Execute(ref str, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref objTrue, ref str, ref replace, ref missing, ref missing, ref missing, ref missing);
    }
}
finally
{
    if (find != null) Marshal.ReleaseComObject(find);
    if (duplicate != null) Marshal.ReleaseComObject(duplicate);
}

我从Oliver的逻辑开始,事情看起来很好,但测试显示这种方法没有考虑到空白。 因此,仅由空格分隔的突出显示的短语没有被分开。 我使用了grahamj42提供的VB代码方法,并将其添加为类库,并在我的C#windows窗体项目中包含了引用。

我的C#Windows表单项目:

using Word = Microsoft.Office.Interop.Word;

然后我将try块更改为:

Word.ApplicationClass wordObject = new Word.ApplicationClass();
wordObject.Visible = false;
object file = "D:\\mywordfile.docx";
object nullobject = System.Reflection.Missing.Value;
Word.Document thisDoc = wordObject.Documents.Open(ref file, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject);

List<string> wordHighlights = new List<string>();


// Let myRange be some Range, which has been already selected programatically here


WordMacroClasses.Highlighting macroObj = new WordMacroClasses.Highlighting();
List<string> hiWords = macroObj.HighlightRange(myRange, myRange.End);
foreach (string hitext in hiWords)
{
    wordHighlights.Add(hitext);
}

这里是VB类库中的Range.Find代码, Range.Find接受Range及其Range.Last并返回List(Of String)

Public Class Highlighting
    Public Function HighlightRange(ByVal myRange As Microsoft.Office.Interop.Word.Range, ByVal rangeLimit As Integer) As List(Of String)

        Dim Highlights As New List(Of String)
        Dim i As Integer
        i = 0

        With myRange.Find
            .Highlight = True
            Do While .Execute = True     ' loop while highlighted text is found

                If (myRange.Start < rangeLimit) Then Highlights.Add(myRange.Text)

            Loop
        End With
        Return Highlights
    End Function
End Class

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM