简体   繁体   English

有没有更好的方法来计算C#中字符串中的字符串格式占位符?

[英]Is there a better way to count string format placeholders in a string in C#?

I have a template string and an array of parameters that come from different sources but need to be matched up to create a new "filled-in" string: 我有一个模板字符串和一个参数数组来自不同的来源,但需要匹配以创建一个新的“填充”字符串:

string templateString = GetTemplate();   // e.g. "Mr {0} has a {1}"
string[] dataItems = GetDataItems();     // e.g. ["Jones", "ceiling cat"}

string resultingString = String.Format(templateString, dataItems);
// e.g. "Mr Jones has a ceiling cat"

With this code, I'm assuming that the number of string format placeholders in the template will equal the number of data items. 使用此代码,我假设模板中字符串格式占位符的数量将等于数据项的数量。 It's generally a fair assumption in my case, but I want to be able to produce a resultingString without failing even if the assumption is wrong. 在我的情况下,这通常是一个公平的假设,但我希望能够生成一个resultingString而不会失败,即使假设是错误的。 I don't mind if there are empty spaces for missing data. 我不介意是否有空格来丢失数据。

If there are too many items in dataItems , the String.Format method handles it fine. 如果dataItems中的dataItems太多,则String.Format方法可以很好地处理它。 If there aren't enough, I get an Exception. 如果还不够,我会得到一个例外。

To overcome this, I'm counting the number of placeholders and adding new items to the dataItems array if there aren't enough. 为了解决这个问题,我计算占位符的数量,并在dataItems数组中添加新项目(如果没有足够的话)。

To count the placeholders, the code I'm working with at the moment is: 为了计算占位符,我目前正在处理的代码是:

private static int CountOccurrences(string haystack)
{
    // Loop through all instances of the string "}".
    int count = 0;
    int i = 0;
    while ((i = text.IndexOf("}", i)) != -1)
    {
        i++;
        count++;
    }
    return count;
}

Obviously this makes the assumption that there aren't any closing curly braces that aren't being used for format placeholders. 显然,这假设没有任何关闭花括号没有用于格式占位符。 It also just feels wrong. 这也只是感觉不对。 :) :)

Is there a better way to count the string format placeholders in a string? 有没有更好的方法来计算字符串中的字符串格式占位符?


A number of people have correctly pointed out that the answer I marked as correct won't work in many circumstances. 很多人都正确地指出,我标记为正确的答案在许多情况下都不起作用。 The main reasons are: 主要原因是:

  • Regexes that count the number of placeholders doesn't account for literal braces ( {{0}} ) 计算占位符数的正则数不考虑文字括号( {{0}}
  • Counting placeholders doesn't account for repeated or skipped placeholders (eg "{0} has a {1} which also has a {1}" ) 计算占位符不会考虑重复或跳过的占位符(例如"{0} has a {1} which also has a {1}"

Counting the placeholders doesn't help - consider the following cases: 计算占位符没有帮助 - 考虑以下情况:

"{0} ... {1} ... {0}" - needs 2 values “{0} ... {1} ... {0}” - 需要2个值

"{1} {3}" - needs 4 values of which two are ignored “{1} {3}” - 需要4个值,其中两个被忽略

The second example isn't farfetched. 第二个例子不是牵强附会的。

For example, you may have something like this in US English: 例如,您可能在美国英语中有类似的内容:

String.Format("{0} {1} {2} has a {3}", firstName, middleName, lastName, animal);

In some cultures, the middle name may not be used and you may have: 在某些文化中,可能不会使用中间名,您可能会:

String.Format("{0} {2} ... {3}", firstName, middleName, lastName, animal);

If you want to do this, you need to look for the format specifiers {index[,length][:formatString]} with the maximum index, ignoring repeated braces (eg {{n}}). 如果要执行此操作,则需要使用最大索引查找格式说明符{index [,length] [:formatString]} ,忽略重复的大括号(例如{{n}})。 Repeated braces are used to insert braces as literals in the output string. 重复大括号用于在输出字符串中将大括号插入文字。 I'll leave the coding as an exercise :) - but I don't think it can or should be done with Regex in the most general case (ie with length and/or formatString). 我将把编码留作练习:) - 但我不认为它可以或应该在最常见的情况下使用Regex(即使用length和/或formatString)。

And even if you aren't using length or formatString today, a future developer may think it's an innocuous change to add one - it would be a shame for this to break your code. 即使您今天没有使用length或formatString,未来的开发人员可能会认为添加一个是一个无害的变化 - 这会破坏您的代码将是一种耻辱。

I would try to mimic the code in StringBuilder.AppendFormat (which is called by String.Format) even though it's a bit ugly - use Lutz Reflector to get this code. 我会尝试模仿StringBuilder.AppendFormat(由String.Format调用)中的代码,即使它有点难看 - 使用Lutz Reflector来获取此代码。 Basically iterate through the string looking for format specifiers, and get the value of the index for each specifier. 基本上遍历字符串查找格式说明符,并获取每个说明符的索引值。

Merging Damovisa's and Joe's answers. 合并Damovisa和Joe的答案。 I've updated answer afer Aydsman's nad activa's comments. 我已经更新了答案Aydsman的nad activa的评论。

int count = Regex.Matches(templateString, @"(?<!\{)\{([0-9]+).*?\}(?!})")  //select all placeholders - placeholder ID as separate group
                 .Cast<Match>() // cast MatchCollection to IEnumerable<Match>, so we can use Linq
                 .Max(m => int.Parse(m.Groups[1].Value)) + 1; // select maximum value of first group (it's a placegolder ID) converted to int

This approach will work for templates like: 此方法适用于以下模板:

"{0} aa {2} bb {1}" => count = 3 “{0} aa {2} bb {1}”=> count = 3

"{4} aa {0} bb {0}, {0}" => count = 5 “{4} aa {0} bb {0},{0}”=> count = 5

"{0} {3} , {{7}}" => count = 4 “{0} {3},{{7}}”=> count = 4

You can always use Regex: 您可以随时使用Regex:

using System.Text.RegularExpressions;
// ... more code
string templateString = "{0} {2} .{{99}}. {3}"; 
Match match = Regex.Matches(templateString, 
             @"(?<!\{)\{(?<number>[0-9]+).*?\}(?!\})")
            .Cast<Match>()
            .OrderBy(m => m.Groups["number"].Value)
            .LastOrDefault();
Console.WriteLine(match.Groups["number"].Value); // Display 3

Marqus' answer fails if there are no placeholders in the template string. 如果模板字符串中没有占位符,Marqus的答案将失败。

The addition of the .DefaultIfEmpty() and m==null conditional resolves this issue. 添加.DefaultIfEmpty()m==null条件可以解决此问题。

Regex.Matches(templateString, @"(?<!\{)\{([0-9]+).*?\}(?!})")
     .Cast<Match>()
     .DefaultIfEmpty()
     .Max(m => m==null?-1:int.Parse(m.Groups[1].Value)) + 1;

There is a problem with the regex proposed above in that it will match on "{0}}": 上面提出的正则表达式存在一个问题,它将匹配“{0}}”:

Regex.Matches(templateString, @"(?<!\{)\{([0-9]+).*?\}(?!})")
...

The problem is when looking for the closing } it uses .* which allows an initial } as a match. 问题是在寻找它使用的关闭时。*允许初始}作为匹配。 So changing that to stop on the first } makes that suffix check work. 因此,将其更改为停在第一个上}会使后缀检查工作。 In other words, use this as the Regex: 换句话说,使用它作为正则表达式:

Regex.Matches(templateString, @"(?<!\{)\{([0-9]+)[^\}]*?\}(?!\})")
...

I made a couple static functions based on all this, maybe you'll find them useful. 我基于这一切制作了几个静态函数,也许你会发现它们很有用。

public static class StringFormat
{
    static readonly Regex FormatSpecifierRegex = new Regex(@"(?<!\{)\{([0-9]+)[^\}]*?\}(?!\})", RegexOptions.Compiled);

    public static IEnumerable<int> EnumerateArgIndexes(string formatString)
    {
        return FormatSpecifierRegex.Matches(formatString)
         .Cast<Match>()
         .Select(m => int.Parse(m.Groups[1].Value));
    }

    /// <summary>
    /// Finds all the String.Format data specifiers ({0}, {1}, etc.), and returns the
    /// highest index plus one (since they are 0-based).  This lets you know how many data
    /// arguments you need to provide to String.Format in an IEnumerable without getting an
    /// exception - handy if you want to adjust the data at runtime.
    /// </summary>
    /// <param name="formatString"></param>
    /// <returns></returns>
    public static int GetMinimumArgCount(string formatString)
    {
        return EnumerateArgIndexes(formatString).DefaultIfEmpty(-1).Max() + 1;
    }

}

Not actually an answer to your question, but a possible solution to your problem (albeit not a perfectly elegant one); 实际上不是你问题的答案,而是你问题的可能解决方案(虽然不是一个非常优雅的问题); you could pad your dataItems collection with a number of string.Empty instances, since string.Format does not care about redundant items. 您可以使用许多string.Empty实例填充dataItems集合,因为string.Format不关心冗余项。

Perhaps you are trying to crack a nut with a sledgehammer? 也许你正试图用大锤破解坚果?

Why not just put a try/catch around your call to String.Format. 为什么不在调用String.Format时调用try / catch

It's a bit ugly, but solves your problem in a way that requires minimal effort, minimal testing, and is guaranteed to work even if there is something else about formatting strings that you didn't consider (like {{ literals, or more complex format strings with non-numeric characters inside them: {0:$#,##0.00;($#,##0.00);Zero}) 这有点难看,但是以一种需要最少的努力,最少的测试的方式解决你的问题,并且即使还有其他关于你没有考虑的格式化字符串的东西(例如{{literals,或更复杂的格式),也可以保证工作其中包含非数字字符的字符串:{0:$#,## 0.00;($#,## 0.00); Zero})

(And yes, this means you won't detect more data items than format specifiers, but is this a problem? Presumably the user of your software will notice that they have truncated their output and rectify their format string?) (是的,这意味着你不会检测到比格式说明符更多的数据项,但这是一个问题吗?可能你的软件用户会注意到他们已经截断了输出并纠正了他们的格式字符串?)

Very Late to the question, but happened upon this from another tangent. 问题很晚,但是从另一个切线发生了这个问题。

String.Format is problematic even with Unit Testing (ie missing an argument). 即使使用单元测试(即缺少参数),String.Format也存在问题。 A developer puts in the wrong positional placeholder or the formatted string is edited and it compiles fine, but it is used in another code location or even another assembly and you get the FormatException at runtime. 开发人员放入错误的位置占位符或编辑格式化的字符串并编译正常,但它在另一个代码位置或甚至另一个程序集中使用,并且您在运行时获得FormatException。 Ideally Unit test or Integration tests should catch this. 理想情况下,单元测试或集成测试应该抓住这个。

While this isn't a solution to the answer it is a workaround. 虽然这不是解决方案,但它是一种解决方法。 You can make a helper method that accepts the formatted string and a list (or array) of objects. 您可以创建一个辅助方法来接受格式化的字符串和对象的列表(或数组)。 Inside the helper method pad the list to a predefined fixed length that would exceed the number of placeholders in your messages. 在帮助器方法内部,将列表填充到预定义的固定长度,该长度将超过消息中的占位符数。 So for example below assume that 10 placeholders is sufficient. 因此,例如下面假设10个占位符就足够了。 The padding element can be null or a string like "[Missing]". padding元素可以为null或类似“[Missing]”的字符串。

int q = 123456, r = 76543;
List<object> args = new List<object>() { q, r};     

string msg = "Sample Message q = {2:0,0} r = {1:0,0}";

//Logic inside the helper function
int upperBound = args.Count;
int max = 10;

for (int x = upperBound; x < max; x++)
{
    args.Add(null); //"[No Value]"
}
//Return formatted string   
Console.WriteLine(string.Format(msg, args.ToArray()));

Is this ideal? 这是理想的吗? Nope, but for logging or some use cases it is an acceptable alternative to prevent the runtime exception. 不,但对于日志记录或某些用例,它是防止运行时异常的可接受替代方法。 You could even replace the null element with "[No Value]" and/or add array positions then test for No Value in the formatted string then log it as an issue. 您甚至可以用“[No Value]”替换null元素和/或添加数组位置,然后在格式化字符串中测试No Value,然后将其记录为问题。

Since I don't have the authority to edit posts, I'll propose my shorter (and correct) version of Marqus' answer: 由于我没有权限编辑帖子,我会提出我的更短(和正确)版本的Marqus答案:

int num = Regex.Matches(templateString,@"(?<!\{)\{([0-9]+).*?\}(?!})")
             .Cast<Match>()
             .Max(m => int.Parse(m.Groups[0].Value)) + 1;

I'm using the regex proposed by Aydsman, but haven't tested it. 我正在使用Aydsman提出的正则表达式,但尚未对其进行测试。

Based on this answer and David White's answer here is an updated version: 根据这个答案 ,David White的答案是更新版本:

string formatString = "Hello {0:C} Bye {{300}} {0,2} {34}";
//string formatString = "Hello";
//string formatString = null;

int n;
var countOfParams = Regex.Matches(formatString?.Replace("{{", "").Replace("}}", "") ?? "", @"\{([0-9]+)")
    .OfType<Match>()
    .DefaultIfEmpty()
    .Max(m => Int32.TryParse(m?.Groups[1]?.Value, out n) ? n : -1 )
    + 1;

Console.Write(countOfParams);

Things to note: 注意事项:

  1. Replacing is a more straightforward way to take care of double curly braces. 更换是一种更直接的方式来处理双花括号。 This is similar to how StringBuilder.AppendFormatHelper takes care of them internally. 这类似于StringBuilder.AppendFormatHelper在内部处理它们的方式。
  2. As were are eliminating '{{' and '}}', regex can be simplified to '{([0-9]+)' 正如消除'{{'和'}}'一样,正则表达式可以简化为'{([0-9] +)'
  3. This will work even if formatString is null 即使formatString为null,这也会起作用
  4. This will work even if there is an invalid format, say '{3444444456}'. 即使格式无效,这也会有效,比如'{3444444456}'。 Normally this will cause integer overflow. 通常这会导致整数溢出。

You could "abuse" the ICustomFormatter to gather the placeholders and return them to the caller. 您可以“滥用” ICustomFormatter来收集占位符并将它们返回给调用者。 This simply reuses the built-in parsing algorithm, instead of trying to reimplement it (and possibly deviate from the built-in algorithm). 这简单地重用了内置的解析算法,而不是尝试重新实现它(并且可能偏离内置算法)。

using System;
using System.Collections.Generic;
using System.Linq;

namespace FormatPlaceholders {

    class Program {

        class FormatSnooper : IFormatProvider, ICustomFormatter {

            public object GetFormat(Type formatType) {
                return this;
            }

            public string Format(string format, object arg, IFormatProvider formatProvider) {
                Placeholders.Add(((int)arg, format));
                return null;
            }

            internal readonly List<(int index, string format)> Placeholders = new List<(int index, string format)>();

        }

        public static IEnumerable<(int index, string format)> GetFormatPlaceholders(string format, int max_count = 100) {

            var snooper = new FormatSnooper();

            string.Format(
                snooper,
                format,
                Enumerable.Range(0, max_count).Cast<object>().ToArray()
            );

            return snooper.Placeholders;

        }

        static void Main(string[] args) {
            foreach (var (index, format) in GetFormatPlaceholders("{1:foo}{4:bar}{1:baz}"))
                Console.WriteLine($"{index}: {format}");
        }

    }

}

Which prints: 哪个印刷品:

1: foo
4: bar
1: baz

You can then easily find the max of index , count, find "holes" etc... 然后,您可以轻松找到index的最大值,计数,找到“漏洞”等...


I realize I'm (years) late to the party, but I had the need for something similar to what OP asked, so I share the solution I came up with here, in case someone finds it useful... 我意识到我已经(多年)参加派对了,但是我需要类似于OP所要求的东西,所以我分享了我在这里提出的解决方案,以防有人发现它有用......

You could use a regular expression to count the {} pairs that have only the formatting you'll use between them. 您可以使用正则表达式来计算仅具有您将在它们之间使用的格式的{}对。 @"\\{\\d+\\}" is good enough, unless you use formatting options. 除非你使用格式化选项,否则@“\\ {\\ d + \\}”就足够了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM