简体   繁体   English

如何检查Word文件是否包含文本C#

[英]How to check if the Word file contains text C#

I am processing some word files and now I would like to see if the file that is being processed contains something else then " shapes " In my case that would be plain text我正在处理一些 word 文件,现在我想看看正在处理的文件是否包含其他内容然后“形状”在我的情况下是纯文本

I know how I can detect if the file contains shapes etc. But to see if a document contains text I am not really sure how I should do that我知道如何检测文件是否包含形状等。但是要查看文档是否包含文本,我不确定我应该如何做

string path = "C:/Users/Test/Desktop/Test/";
foreach (string file in Directory.EnumerateFiles(path, "*.docx"))
{
   var fileInfo = new FileInfo(file);

   if (!fileInfo.Name.StartsWith("~$"))
   {
        var wordApplication = new Microsoft.Office.Interop.Word.Application();
        var document = wordApplication.Documents.Open(file);

        if (document.Content.Text.Contains(""))
        {
           Console.WriteLine(document.Name);
        }
   }

Maybe something like that so if the document does not contains anything ?也许是这样的,所以如果文档不包含任何内容?

even when I enter a word file that has text and one that has no text both gets shown in the console即使我输入一个有文本的 word 文件和一个没有文本的 word 文件都会显示在控制台中

You can count the number of words in the word document.可以统计word文档的字数。

if (document.Words.Count <= 0)
{
    Console.WriteLine(document.Name);
}

You can use the Open XML SDK from Microsoft to look for specific elements inside a Word Document.您可以使用 Microsoft 的Open XML SDK来查找 Word 文档中的特定元素。 This does not require that Office is installed on the machine where your program is running.这不需要在运行程序的计算机上安装 Office。

For looking for shapes How to get list of shapes in SdtBlock element using Open XML SDK?用于查找形状如何使用 Open XML SDK 获取 SdtBlock 元素中的形状列表? gives a nice sample:给出了一个很好的示例:

To give you an idea you can easily iterate through all elements like in this sample to decide whether the Word file is suitable for processing or not.为了给您一个想法,您可以轻松地遍历本示例中的所有元素,以确定 Word 文件是否适合处理。 Please note that this code is just sketching the idea.请注意,这段代码只是勾勒出这个想法。

        var package = WordprocessingDocument.Open(wordFileStream, false);
        OpenXmlElement element = package.MainDocumentPart.Document.Body;
        foreach (OpenXmlElement section in element.Elements())
        {
            switch (section.LocalName)
            {
                // Text 
                case "t":
                    // we have found text
                    break;
                case "cr":                          // Carriage return 
                case "br":                          // Page break 
                    // we have found carriage return or page break
                    break;
                case "p":
                    // we have found a paragraph
                    break;
                default:
                    // we have found something else
                    break;
            }
        }

A reference for shapes is found here .可在此处找到形状参考。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM