简体   繁体   English

打开XML Word文档-查找突出显示的文本

[英]Open XML word document - find highlighted text

I have word document, where every paragraph is a very long line. 我有Word文档,其中每个段落都是很长的一行。 Something like: 就像是:

"NameOfSomeSort     ----ASDdASFA---F-TEXT-FASFASFAS----FASFASF"

characters 人物

"TEXT"

are being highlighted. 突出显示。 I need to be able to tell, which characters in line are highlited and get their position index in the line. 我需要能够分辨出行中哪些字符已高亮显示并获得其在行中的位置索引。

I was able to do it via Interoop, but the operation will take cca 5-10 hours to go through whole document. 我可以通过Interoop来完成此操作,但是该操作大约需要5-10个小时才能浏览整个文档。 So I tried OpenXML, but I'm not able to get text properties like Highlight when I cycle through paragraphs texts. 所以我尝试了OpenXML,但是当我在段落文本中循环浏览时,无法获得诸如Highlight的文本属性。

Highlight is applied to the run (in runProperties) ( https://msdn.microsoft.com/en-us/library/documentformat.openxml.wordprocessing.highlight(v=office.14).aspx ) 高亮显示应用于运行(在runProperties中)( https://msdn.microsoft.com/zh-cn/library/documentformat.openxml.wordprocessing.highlight ( v= office.14).aspx)

if your text is "aaaaa [i am highlight] bbbb" the openxml will look like 如果您的文字是“ aaaaa [我是重点] bbbb”,则openxml如下所示

<w:Paragraph>
  <w:Run><w:Text>aaaaa</w:Text></w:Run>
  <w:Run>
    <w:rPr>
      <w:highlight w:val="yellow" />
    </w:rPr>
    <w:Text>[i am highlight]</w:Text>
  </w:Run>
  <w:Run><w:Text>bbbb</w:Text></w:Run>  
</w:Paragraph>

So, to find wich text is highlight you have to search for the highlight tag with something like Paragraph.Descendants<Highlight>() 因此,要查找至高亮的文本,您必须使用诸如Paragraph.Descendants<Highlight>()类的内容来搜索Paragraph.Descendants<Highlight>()标签。

If you need to retrieve the position you can use some algorithm like 如果您需要检索位置,可以使用一些算法,例如

// Suppose you have the paragraph p you want to inspec and the run r containing highlight
int pos = 0;
OpenXmlElement oxe = null;
// From the run search for the parent (Paragraph p)
// Add the length of previous text in pos
while ((oxe = r.previousSibling()) != p)
{
  pos += ((Run)oxe).Innertext.Length;
}
// here pos should return where the highlight begin (maybe it's pos+1...)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM