简体   繁体   English

使用 C# 中的 aspose.words 从 word 文档中提取项目符号

[英]Extract bullets from word document using aspose.words in C#

I need to extract the text with the bullet style from a word document in C#.我需要从 C# 中的 word 文档中提取带有项目符号样式的文本。 I am using the aspose.words library but a solution with a different library is also welcome.我正在使用 aspose.words 库,但也欢迎使用不同库的解决方案。 I can already upload documents and extract the text with heading1 styling.我已经可以上传文档并使用 header1 样式提取文本。 but when I try the same with the bullet styling I get nothing.但是当我尝试使用子弹样式时,我什么也没得到。

I am using the code below to get the text with Heading1 styling and that works.我正在使用下面的代码来获取带有 Heading1 样式的文本,并且可以正常工作。

var heading1 = doc
    .GetChildNodes(NodeType.Paragraph, true)
    .Cast<Aspose.Words.Paragraph>()
    .ToArray()
    .Where(p => p.ParagraphFormat.StyleIdentifier == StyleIdentifier.Heading1);
    
foreach (var head1 in heading1)
{
    listBox11.Items.Add(head1.gettext()tostring());
}

I am trying to use the code below to get the text with bullet styling and this does NOT work.我正在尝试使用下面的代码来获取带有项目符号样式的文本,但这不起作用。

var bullets = doc
    .GetChildNodes(NodeType.Paragraph, true)
    .Cast<Aspose.Words.Paragraph>()
    .ToArray()
    .Where(p => p.ParagraphFormat.StyleIdentifier == StyleIdentifier.ListBullet);
    
foreach (var bullet in bullets)
{
    listBox19.Items.Add(bullet.GetText().ToString());
}
    
listBox19.Items.Add(bullet1.GetText().ToString());

I also tried using the listbullet1,2,3,4 and 5 styleIdentifiers but that also does not fix the problem.我也尝试使用 listbullet1,2,3,4 和 5 styleIdentifiers 但这也不能解决问题。

Most likely your code does not work because bullets are not applied via style.很可能您的代码不起作用,因为项目符号不是通过样式应用的。 In MS Word document there are several levels where you can apply formatting: Document defaults, Theme, Style and direct formatting.在 MS Word 文档中有几个级别可以应用格式:文档默认值、主题、样式和直接格式。 In your case, I think, the best way is to use ListFormat.IsListItem property.我认为,就您而言,最好的方法是使用ListFormat.IsListItem属性。

I am now using this to succesfully extract the list items from a word file and put them into a listbox.我现在正在使用它成功地从 word 文件中提取列表项并将它们放入列表框中。

       string fileName = listBox1.Items.Cast<string>().FirstOrDefault();
                // Open the document.
                Document doc = new Document(fileName);

                doc.UpdateListLabels();

                NodeCollection paras = doc.GetChildNodes(NodeType.Paragraph, true);

                // Find if we have the paragraph list. In our document, our list uses plain Arabic numbers,
                // which start at three and ends at six.
                foreach (Aspose.Words.Paragraph paragraph in paras.OfType<Aspose.Words.Paragraph>().Where(p => p.ListFormat.IsListItem))
                {
                    //listBox19.Items.Add($"List item paragraph #{paras.IndexOf(paragraph)}");

                    // This is the text we get when getting when we output this node to text format.
                    // This text output will omit list labels. Trim any paragraph formatting characters. 
                    string paragraphText = paragraph.ToString(SaveFormat.Text).Trim();
                    //remove the dot in front of the bullet
                    string bullet = paragraphText.Remove(0, 2);

                    listBox19.Items.Add(bullet);

                    ListLabel label = paragraph.ListLabel;
                }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM