简体   繁体   中英

Replacing words in a Word document cause multiple times replacement with C#

I need to create a C#.NET program which will search specific words in a Microsoft Word document and will replace it with another words. For example, in my word file there is a text which is – LeadSoft IT . This “LeadSoft IT” will be replaced by – LeadSoft IT Limited . Now there is a problem which is, at the first time LeadSoft IT will be replaced with LeadSoft IT Limited. But if I run the program again then it will change LeadSoft IT again and in the next time the text will be LeadSoft IT Limited Limited . This is a problem. Can anyone suggest me how to solve this problem with C# code to replace words in word document.

If you already have some script for this, feel free to post it and I'll try and help more.

I'm not sure what functionality you're using to find the text instance, but I would suggest looking into regex, and using something like (LeadSoft IT(?! Limited)) .

Regex: https://regexr.com/ A good regex tester: https://www.regextester.com/109925

Edit: I made a Python script that uses regex to replace the instances:

import re

word_doc = "We like working " \
           "here at Leadsoft IT.\n" \
           "We are not limited here at " \
           "Leadsoft It Limited."

replace_str = "Leadsoft IT Limited"

reg_str = '(Leadsoft IT(?!.?Limited))'

fixed_str = re.sub(reg_str, replace_str, word_doc, flags=re.IGNORECASE)

print(fixed_str)

# Prints:
# We like working here at Leadsoft IT Limited.
# We are not limited here at Leadsoft It Limited.

Edit 2: Code re-created in C#: https://gist.github.com/Zylvian/47ecd6d1953b8d8c3900dc30645efe98

The regex checks the entire string for instances where Leadsoft IT is NOT followed by Limited , and for all those instances, replaces Leadsoft IT with Leadsoft IT Limited .

The regex uses what's called a "negative lookahead (?!)" which makes sure that the string to the left is not followed by the string to the right. Feel free to edit the regex how you see fit, but be aware that the matching is very strong.

If you want to understand the regex string better, feel free to copy it into https://www.regextester.com/ .

Let me know if that helps!

Simplistically, you can just run another replace to fix the problem you cause:

s = s.Replace("LeadSoft IT", "LeadSoft IT Limited").Replace("LeadSoft IT Limited Limited", "LeadSoft IT Limited");

If you're after a more generic fixing of this that doesn't hard code the problem string, consider examining whether the string you find is inside the string you replace with, which will mean the problem occurs. This means you need to run a second replacement on the document that finds the result of running the replacement on the replacement

var find = "LeadSoft IT";
var repl = "LeadSoft IT Limited";

var result = document.Replace(find, repl);

var problemWillOccur = repl.Contains(find);

if(problemWillOccur){

  var fixProblemByFinding = repl.Replace(find, repl); //is "LeadSoft IT Limited Limited"

  result = result.Replace(fixProblemByFinding, repl);

}

You may be interested how I solve this problem. At first, I was using NPOI but it was making a mess with document, so I discovered that a DOCX file is simply a ZIP Archive with XMLs.

https://github.com/kubala156/DociFlow/blob/main/DociFlow.Lib/Word/SeekAndReplace.cs

Usage:

var vars = Dictionary<string, string>() 
{
    { "testtag", "Test tag value" }
}
using (var doci = new DociFlow.Lib.Word.SeekAndReplace())
{
    // test.docx contains text with tag "{{testtag}}" it will be replaced with "Test tag value"
    doci.Open("test.docx");
    doci.FindAndReplace(vars, "{{", "}}");
}

NPOI 2.5.4 provides ReplaceText method to help you replace placeholders in a Word file.

Here is an example. https://github.com/nissl-lab/npoi-examples/blob/main/xwpf/ReplaceTexts/Program.cs

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM