简体   繁体   中英

Best way to replace tokens in a large text template

I have a large text template which needs tokenized sections replaced by other text. The tokens look something like this: ##USERNAME##. My first instinct is just to use String.Replace(), but is there a better, more efficient way or is Replace() already optimized for this?

System.Text.RegularExpressions.Regex.Replace() is what you seek - IF your tokens are odd enough that you need a regex to find them.

Some kind soul did some performance testing , and between Regex.Replace(), String.Replace(), and StringBuilder.Replace(), String.Replace() actually came out on top.

The only situation in which I've had to do this is sending a templated e-mail. In .NET this is provided out of the box by the MailDefinition class . So this is how you create a templated message:

MailDefinition md = new MailDefinition();
md.BodyFileName = pathToTemplate;
md.From = "test@somedomain.com";

ListDictionary replacements = new ListDictionary();
replacements.Add("<%To%>", someValue);
// continue adding replacements

MailMessage msg = md.CreateMailMessage("test@someotherdomain.com", replacements, this);

After this, msg.Body would be created by substituting the values in the template. I guess you can take a look at MailDefinition.CreateMailMessage() with Reflector :). Sorry for being a little off-topic, but if this is your scenario I think it's the easiest way.

Well, depending on how many variables you have in your template, how many templates you have, etc. this might be a work for a full template processor. The only one I've ever used for .NET is NVelocity , but I'm sure there must be scores of others out there, most of them linked to some web framework or another.

FastReplacer implements token replacement in O(n*log(n) + m) time and uses 3x the memory of the original string.

FastReplacer is good for executing many Replace operations on a large string when performance is important.

The main idea is to avoid modifying existing text or allocating new memory every time a string is replaced.

We have designed FastReplacer to help us on a project where we had to generate a large text with a large number of append and replace operations. The first version of the application took 20 seconds to generate the text using StringBuilder. The second improved version that used the String class took 10 seconds. Then we implemented FastReplacer and the duration dropped to 0.1 seconds.

string.Replace is fine. I'd prefer using a Regex, but I'm *** for regular expressions.

The thing to keep in mind is how big these templates are. If its real big, and memory is an issue, you might want to create a custom tokenizer that acts on a stream. That way you only hold a small part of the file in memory while you manipulate it.

But, for the naiive implementation, string.Replace should be fine.

如果要对大字符串进行多次替换,则最好使用StringBuilder.Replace(),因为通常会出现字符串性能问题。

Had to do something similar recently. What I did was:

  • make a method that takes a dictionary (key = token name, value = the text you need to insert)
  • Get all matches to your token format (##.+?## in your case I guess, not that good at regular expressions :P) using Regex.Matches(input, regular expression)
  • foreach over the results, using the dictionary to find the insert value for your token.
  • return result.

Done ;-)

If you want to test your regexes I can suggest the regulator.

Regular expressions would be the quickest solution to code up but if you have many different tokens then it will get slower. If performance is not an issue then use this option.

A better approach would be to define token, like your "##" that you can scan for in the text. Then select what to replace from a hash table with the text that follows the token as the key.

If this is part of a build script then nAnt has a great feature for doing this called Filter Chains . The code for that is open source so you could look at how its done for a fast implementation.

If your template is large and you have lots of tokens, you probably don't want walk it and replace the token in the template one by one as that would result in an O(N * M) operation where N is the size of the template and M is the number of tokens to replace.

The following method accepts a template and a dictionary of the keys value pairs you wish to replace. By initializing the StringBuilder to slightly larger than the size of the template, it should result in an O(N) operation (ie it shouldn't have to grow itself log N times).

Finally, you can move the building of the tokens into a Singleton as it only needs to be generated once.

static string SimpleTemplate(string template, Dictionary<string, string> replacements)
{
   // parse the message into an array of tokens
   Regex regex = new Regex("(##[^#]+##)");
   string[] tokens = regex.Split(template);

   // the new message from the tokens
   var sb = new StringBuilder((int)((double)template.Length * 1.1));
   foreach (string token in tokens)
      sb.Append(replacements.ContainsKey(token) ? replacements[token] : token);

   return sb.ToString();
}

This is an ideal use of Regular Expressions. Check out this helpful website , the .Net Regular Expressions class , and this very helpful book Mastering Regular Expressions .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM