使用 C# 中的 aspose.words 从 word 文档中提取 2 个字符串之间的文本

Question

I have a word document from which I need to extract a few lines of text.我有一个 word 文档，我需要从中提取几行文本。 the text i need to extract can be found in between the two strings: “must haves” and “could haves”.我需要提取的文本可以在两个字符串之间找到：“must haves”和“could haves”。 Does anyone know what I should do to achieve this?有谁知道我应该怎么做才能实现这一目标？

Answer 1

You can use IReplacingCallback to achieve what you need.您可以使用IReplacingCallback来实现您所需要的。 For example see the following code:例如看下面的代码：

Document doc = new Document(@"C:\temp\in.docx");
FindReplaceOptions opt = new FindReplaceOptions();
opt.ReplacingCallback = new MyReplacingCallback();
Regex regex = new Regex(@"\<mytag\>(.*?)\<\/mytag\>");
doc.Range.Replace(regex, "", opt);

private class MyReplacingCallback : IReplacingCallback
{
    public ReplaceAction Replacing(ReplacingArgs args)
    {
        Console.WriteLine(args.Match.Groups[1].Value);
        return ReplaceAction.Skip;
    }
}

Answer 2

use tika to extract text from docx... : https://www.nuget.org/packages/TikaOnDotNet.TextExtractor使用 tika 从 docx 中提取文本...： https://www.nuget.org/packages/TikaOnDotNet.TextExtractor

var str = new TikaOnDotNet.TextExtraction.TextExtractor().Extract(@"C:\Users\Inconnu\Downloads\test.docx").Text;

            int pForm = str.IndexOf("must haves") + "must haves".Length;
            int pTo = str.LastIndexOf("could haves");

            string result = str.Substring(pForm, pTo - pForm);

使用 C# 中的 aspose.words 从 word 文档中提取 2 个字符串之间的文本

问题描述

2 个解决方案

解决方案1
1 2021-01-04 08:59:49

解决方案2
0 2020-12-30 17:28:03

使用 C# 中的 aspose.words 从 word 文档中提取 2 个字符串之间的文本

问题描述

2 个解决方案

解决方案1 1 2021-01-04 08:59:49

解决方案2 0 2020-12-30 17:28:03

解决方案1
1 2021-01-04 08:59:49

解决方案2
0 2020-12-30 17:28:03