简体   繁体   中英

How to find exact word from word document using Open XML in C#?

I need to find exact word which I want to replace from word document using Open XML in C#. the purpose of replacing the personal details of user with some special character so that its not visible to reader.

For an example, the user has address mentioned in his form, which is stored in database he also has one word document uploaded, the word document also contain following type of string which matches his address. my purpose is to match the address with ###

sign so that other users cant see the address. eg

 "422, Plot no. 1000/A, The Moon Residency II, Shree Nagrik Co. Op. Society, Sardarnagar, Ahmedabad.

Looking for an opportunity that surpasses in making me a personality that influences the masses and that too effectively. Organizationally, I would strive to work at a single

place with no professional switches being made and would love to work in an environment that demands constant evolution with variable domains incorporated to deal

with."

I want to replace "Co", "Op" with "#" sign. My output would be this:

"422, Plot no. 1000/A, The Moon Residency II, Shree Nagrik #. #. Society, Sardarnagar, Ahmedabad.

Looking for an opportunity that surpasses in making me a personality that influences the masses and that too effectively. Organizationally, I would strive to work at a single

place with no professional switches being made and would love to work in an environment that demands constant evolution with variable domains incorporated to deal

with. "

Now i have several questions 1. How can i search for whole word, right now my code replaces opportunity word with ##portunity since this word has Op. Same with Constant it replaces with ##nstant. I need to replace if the whole word matches.

  1. how can i match the whole line in the word or may be the whole address, the address should be replace as whole, if not possible, it should replace 70-80%.

Currently my code is as bellow to replace word into word file.

MemoryStream m = new System.IO.MemoryStream();
//strResumeName contain my word file url
m = objBlob.GetResumeFile(strResumeName);

   using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(m, true))
  {
            body = wordDoc.MainDocumentPart.Document.Body;
            colT = body.Descendants<DocumentFormat.OpenXml.Wordprocessing.Text>();
            foreach (DocumentFormat.OpenXml.Wordprocessing.Text c in colT)
              {
                 if (c.InnerText.Trim() != String.Empty)
                     {
                       sb.Append(c.InnerText.Trim() + " ");
                     }
              }
               string[] strParts = sb.ToString().Split(' ');
               HyperLinkList = HyperLinksList(wordDoc);
               redactionTags = GetReductionstrings(strParts);
}
 using (Novacode.DocX document = Novacode.DocX.Load(m))
 {
//objCandidateLogin.Address contain my address
  if (!String.IsNullOrEmpty(objCandidateLogin.Address))
  {
     string[] strParts = objCandidateLogin.Address.Replace(",", " ").Split(' ');
     for (int I = 0; I <= strParts.Length - 1; I++)
       {
            if (strParts[I].Trim().Length > 1)
             {
                document.ReplaceText(strParts[I].Trim(), "#############", false, RegexOptions.IgnoreCase);
              }
          }

   }
}

You're using OpenXML with Novacode, you should consider using just OpenXML.

About the replacing text with "#". You will have to iterate through all paragraphs in the word document and check the Text elements within them to see if the text you're looking for exists and if it exists you can replace the text.

Nothing else to it. Hope this helps.

IEnumerable<Paragraph> paragraphs = document.Body.Descendants<Paragraph>();
foreach(Paragraph para in paragraphs)
{
    String text = para.Descendents<Text>().FirstOrDefault();
    //Code to replace text with "#"
}

I've written this code out of memory, but if you proceed on these lines, you will find your solution.

You can use the method TextReplacer in PowerTools for Open XML to accomplish what you want. Then you can do something like this:

using DocumentFormat.OpenXml.Packaging;
using OpenXmlPowerTools;
using System.IO;

namespace SearchAndReplace
{
    internal class Program
    {
        private static void Main(string[] args)
        {
            using (WordprocessingDocument doc = WordprocessingDocument.Open("Test01.docx", true))
                TextReplacer.SearchAndReplace(wordDoc:doc, search:"the", replace:"this", matchCase:false);
        }
    }
}

To install the Nuget package for OpenXml Power Tools, run the following command in the Package Manager Console

PM > Install-Package OpenXmlPowerTools

There is an OpenXML Power Tools class for searc and replace text in OpenXML Document. Get it from here. http://openxmldeveloper.org/blog/b/openxmldeveloper/archive/2011/08/04/introducing-textreplacer-a-new-class-for-powertools-for-open-xml.aspx

Hope this helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM