简体   繁体   English

避免 C++11 原始字符串文字中的第一个换行符?

[英]avoiding the first newline in a C++11 raw string literal?

The raw string literals in C++11 are very nice, except that the obvious way to format them leads to a redundant newline \n as the first character. C++11 中的原始字符串文字非常好,只是格式化它们的明显方式导致多余的换行符\n作为第一个字符。

Consider this example:考虑这个例子:

    some_code();
    std::string text = R"(
This is the first line.
This is the second line.
This is the third line.
)";
    more_code();

The obvious workaround seems so ugly:明显的解决方法看起来很丑陋:

    some_code();
    std::string text = R"(This is the first line.
This is the second line.
This is the third line.
)";
    more_code();

Has anyone found an elegant solution to this?有没有人找到一个优雅的解决方案?

You can get a pointer to the 2nd character - skipping the leading newline - by adding 1 to the const char* to which the string literal is automatically converted:您可以获得指向第二个字符的指针 - 跳过前导换行符 - 通过将 1 添加到字符串文字自动转换为的const char*

    some_code();
    std::string text = 1 + R"(
This is the first line.
This is the second line.
This is the third line.
)";
    more_code();

IMHO, the above is flawed in breaking with the indentation of the surrounding code.恕我直言,上述内容在破坏周围代码的缩进方面存在缺陷。 Some languages provide a built-in or library function that does something like:一些语言提供了一个内置或库函数,它可以执行以下操作:

  • removes an empty leading line, and删除一个空的前导行,和
  • looks at the indentation of the second line and removes the same amount of indentation from all further lines查看第二行的缩进并从所有其他行中删除相同数量的缩进

That allows usage like:这允许使用如下:

some_code();
std::string text = unindent(R"(
    This is the first line.
    This is the second line.
    This is the third line.
    )");
more_code();

Writing such a function is relatively simple...写这样的函数比较简单...

std::string unindent(const char* p)
{
    std::string result;
    if (*p == '\n') ++p;
    const char* p_leading = p;
    while (std::isspace(*p) && *p != '\n')
        ++p;
    size_t leading_len = p - p_leading;
    while (*p)
    {
        result += *p;
        if (*p++ == '\n')
        {
            for (size_t i = 0; i < leading_len; ++i)
                if (p[i] != p_leading[i])
                    goto dont_skip_leading;
            p += leading_len;
        }
      dont_skip_leading: ;
    }
    return result;
}

(The slightly weird p_leading[i] approach is intended to make life for people who use tabs and spaces no harder than they make it for themselves ;-P, as long as the lines start with the same sequence.) (稍微奇怪的p_leading[i]方法旨在让使用制表符和空格的人的生活不会比他们自己制作的更难;-P,只要行以相同的顺序开始。)

This is probably not what you want, but just in case, you should be aware of automatic string literal concatenation:这可能不是您想要的,但为了以防万一,您应该注意自动字符串文字连接:

    std::string text =
"This is the first line.\n"
"This is the second line.\n"
"This is the third line.\n";

I recommend @Brian's answer, especially if you only need to have few lines of text, or that which you can handle with your text editor-fu.我推荐@Brian 的回答,特别是如果你只需要几行文本,或者你可以用你的文本编辑器处理的文本。 I have an alternative if that isn't the case.如果不是这样,我有一个选择。

    std::string text =
"\
This is the first line." R"(
This is the second line.
This is the third line.)";

Live example活生生的例子

Raw string literals can still concatenate with "normal" string literals, as shown in the code.原始字符串文字仍然可以与“普通”字符串文字连接,如代码所示。 The "\\ at the start is meant to "eliminate" the " character from the first line, putting it in a line of its own instead.开头的"\\旨在“消除”第一行中的"字符,而是将其放在自己的一行中。

Still, if I were to decide, I would put such lotsa-text into a separate file and load it at runtime.尽管如此,如果我要决定,我会将这样的 lota-text 放入一个单独的文件中并在运行时加载它。 No pressure to you though :-).不过对你没有压力:-)。

Also, that is one of the uglier code I've written these days.此外,这是我最近编写的最丑陋的代码之一。

The closest I can see is:我能看到的最接近的是:

std::string text = ""
R"(This is the first line.
This is the second line.
This is the third line.
)";

It would be a bit nicer if a whitespace was allowed in the delimiter sequence.如果分隔符序列中允许有空格会更好一些。 Give or take the indentation:给予或采取缩进:

std::string text = R"
    (This is the first line.
This is the second line.
This is the third line.
)
    ";

My preprocessor will let you off with a warning about this, but unfortunately it's a bit useless.我的预处理器会让你对此发出警告,但不幸的是它有点无用。 Clang and GCC get thrown off completely. Clang 和 GCC 被完全抛弃了。

The accepted answer produces the warning cppcoreguidelines-pro-bounds-constant-array-index from clang-tidy .接受的答案从clang-tidy产生警告cppcoreguidelines-pro-bounds-constant-array-index See Pro.bounds: Bounds safety profile for details.有关详细信息,请参阅Pro.bounds:Bounds 安全配置文件

If you don't have std::span but you're at least compiling with C++17 consider:如果您没有std::span但至少使用 C++17 进行编译,请考虑:

constexpr auto text = std::string_view(R"(
This is the first line.
This is the second line.
This is the third line.
)").substr(1);

The main advantages are readability (IMHO) and that you can turn on that clang-tidy warning in the rest of your code.主要优点是可读性(恕我直言),并且您可以在其余代码中打开清晰的警告。

Using gcc if someone does inadvertently reduce the raw string to an empty string you get a compiler error ( demo ) with this approach, while the accepted approach either produces nothing ( demo ) or depending on your compiler settings an "outside bounds of constant string" warning.如果有人无意中将原始字符串减少为空字符串,则使用gcc通过这种方法得到编译器错误 ( demo ),而接受的方法要么不产生任何结果 ( demo ),要么取决于您的编译器设置“常量字符串的外部边界”警告。

Yep, that is annoying.是的,这很烦人。 Perhaps there should be raw literals ( R"PREFIX(" ) and multiline raw literals ( M"PREFIX ).也许应该有原始文字( R"PREFIX(" ) 和多行原始文字( M"PREFIX )。

I came up with this alternative which almost describes itself:我想出了这个几乎可以描述自己的替代方案:

#include<iterator> // std::next
...
{
    ...
    ...
    std::string atoms_text = 
std::next/*_line*/(R"XYZ(
  O123        12.4830720891       13.1055820441        9.5288258996
  O123        13.1055820441       13.1055820441        9.5288258996
)XYZ");
    assert( atoms_text[0] != '\n' );
    ...
}

Limitations:限制:

  1. If the raw literal is empty it will generate an invalid string.如果原始文字为空,它将生成无效字符串。 But that should be obvious to spot.但这应该很明显。
  2. If the raw literal doesn't start with a new line it will eat the first character instead.如果原始文字不以新行开头,它将代替第一个字符。
  3. std::next is constexpr only from C++17, you then can use 1+(char const*)R"XYZ(" but it is not as clear and might produce warning. std::next是仅来自 C++17 的constexpr ,然后您可以使用1+(char const*)R"XYZ("但它不太清楚并且可能会产生警告。
constexpr auto atom_text = 1 + (R"XYZ(
  O123        12.4830720891       13.1055820441        9.5288258996
  O123        13.1055820441       13.1055820441        9.5288258996
)XYZ");

Also, no warranties ;) .此外,没有任何保证 ;) 。 After all, I don't know if it is legal to do arithmetic with pointers to static data.毕竟,我不知道使用指向静态数据的指针进行算术是否合法。


Another advantage of the + 1 approach is that it can be put at the end: + 1方式的另一个优点是可以放在最后:

constexpr auto atom_text = R"XYZ(
  O123        12.4830720891       13.1055820441        9.5288258996
  O123        13.1055820441       13.1055820441        9.5288258996
)XYZ" + 1;

Possibilities are endless:可能性是无限的:

constexpr auto atom_text = &R"XYZ(
  O123        12.4830720891       13.1055820441        9.5288258996
  O123        13.1055820441       13.1055820441        9.5288258996
)XYZ"[1];
constexpr auto atom_text = &1[R"XYZ(
  O123        12.4830720891       13.1055820441        9.5288258996
  O123        13.1055820441       13.1055820441        9.5288258996
)XYZ"];

I had the very same problem and I think the following solution is the best of all the above.我遇到了同样的问题,我认为以下解决方案是上述所有解决方案中最好的。 I hope it'll be helpful for you, too (see example in the comment):我希望它对你也有帮助(见评论中的例子):

/**
 * Strips a multi-line string's indentation prefix.
 *
 * Example:
 * \code
 *   string s = R"(|line one
 *                 |line two
 *                 |line three
 *                 |)"_multiline;
 *   std::cout << s;
 * \endcode
 *
 * This prints three lines: @c "line one\nline two\nline three\n"
 *
 * @author Christian Parpart <christian@parpart.family>
 */

inline std::string operator ""_multiline(const char* text, unsigned long size) {
  if (!*text)
    return {};

  enum class State {
    LineData,
    SkipUntilPrefix,
  };

  constexpr char LF = '\n';
  State state = State::LineData;
  std::stringstream sstr;
  char sep = *text++;

  while (*text) {
    switch (state) {
      case State::LineData: {
        if (*text == LF) {
          state = State::SkipUntilPrefix;
          sstr << *text++;
        } else {
          sstr << *text++;
        }
        break;
      }
      case State::SkipUntilPrefix: {
        if (*text == sep) {
          state = State::LineData;
          text++;
        } else {
          text++;
        }
        break;
      }
    }
  }

  return sstr.str();
}

With C++20 this can now be implemented fully at compile-time by using a string literal operator template .在 C++20 中,这现在可以通过使用字符串文字运算符模板在编译时完全实现。

That has a few key benefits:这有几个主要好处:

  • Only the unindented string will be stored in the resulting binary.只有未缩进的字符串将存储在生成的二进制文件中。
  • No allocations, zero runtime overhead无分配,零运行时开销
  • The resulting value will be a reference to a character array ( const char (&)[N] ) - like normal character literals in C++;结果值将是对字符数组 ( const char (&)[N] ) 的引用 - 就像 C++ 中的普通字符文字一样; so no std::array shenanigans and lifetime issues.所以没有std::array恶作剧和生命周期问题。

Usage Example: godbolt使用示例: godbolt

std::cout << R"(
     a
    b
     c
    d
)"_M << std::endl;
/* Will print the following:
 a
b
 c
d
*/

// The type of R"(...)"_M is const char (&)[N],
// so it can be used like a normal string literal:
std::cout << std::size(R"(asdf)"_M) << std::endl;
// (will print 5)
constexpr std::string_view str = R"(
  foo
  bar
)"_M;
// str == "foo\nbar"

// also works with wchar_t, char8_t, char16_t and char32_t literals:
std::wcout << LR"(
  foo
  bar
)"_M;
std::wcout << std::endl;

Normally it's not possible to pass string literals as template arguments, eg:通常不可能将字符串文字作为模板 arguments 传递,例如:


template<const char* str>
void foo();

// ill-formed
foo<"bar">();

But with C++20 we can now have class-type template arguments, and those could be constant-initialized from a string literal.但是使用 C++20,我们现在可以拥有类类型模板 arguments,并且可以从字符串文字进行常量初始化。

That in combination with the new string literal operator templates makes it possible to get the entire string literal as a template parameter:结合新的字符串文字运算符模板,可以将整个字符串文字作为模板参数:

template<class _char_type, std::size_t size>
struct string_wrapper {
    using char_type = _char_type;

    consteval string_wrapper(const char_type (&arr)[size]) {
        std::ranges::copy(arr, str);
    }

    char_type str[size];
};

template<string_wrapper str>
consteval decltype(auto) operator"" _M() {
    /*...*/
}

// R"(foobar)"_M
// would now result in the following code:
// operator"" _M<string_wrapper<char, 7>{"foobar"}>()

Having both the length and the individual characters as constant expressions enables us now to compute the required size for the unindented string fully at compile-time and storing the resulting string in another template parameter (so that we merely need to return a reference to the final string value):将长度和单个字符都作为常量表达式使我们现在能够在编译时完全计算未缩进字符串所需的大小,并将结果字符串存储在另一个模板参数中(这样我们只需要返回对最终字符串的引用字符串值):

// unindents the individual lines of a raw string literal
// e.g. unindent_string("  \n  a\n  b\n  c\n") -> "a\nb\nc"
template<class char_type>
consteval std::vector<char_type> unindent_string(string_view<char_type> str) {
    /* ... */
}

// returns the size required for the unindented string
template<class char_type>
consteval std::size_t unindent_string_size(string_view<char_type> str) {
    /* ... */
}

// used for sneakily creating and storing
// the unindented string in a template parameter.
template<string_wrapper sw>
struct unindented_string_wrapper {
    using char_type = typename decltype(sw)::char_type;
    static constexpr std::size_t buffer_size = unindent_string_size<char_type>(sw.str);
    using array_ref = const char_type (&)[buffer_size];

    consteval unindented_string_wrapper(int) {
        auto newstr = unindent_string<char_type>(sw.str);
        std::ranges::copy(newstr, buffer);
    }

    consteval array_ref get() const {
        return buffer;
    }

    char_type buffer[buffer_size];
};

// uses a defaulted template argument that depends on the str
// to initialize the unindented string within a template parameter.
// this enables us to return a reference to the unindented string.
template<string_wrapper str, unindented_string_wrapper<str> unindented = 0>
consteval decltype(auto) do_unindent() {
    return unindented.get();
}

// the actual user-defined string literal operator
template<string_wrapper str>
consteval decltype(auto) operator"" _M() {
    return do_unindent<str>();
}

Full Code: godbolt完整代码: godbolt

#include <algorithm>
#include <string_view>
#include <vector>
#include <ranges>

namespace multiline_raw_string {
    template<class char_type>
    using string_view = std::basic_string_view<char_type>;

    // characters that are considered space
    // we need this because std::isspace is not constexpr
    template<class char_type>
    constexpr string_view<char_type> space_chars = std::declval<string_view<char_type>>();
    template<>
    constexpr string_view<char> space_chars<char> = " \f\n\r\t\v";
    template<>
    constexpr string_view<wchar_t> space_chars<wchar_t> = L" \f\n\r\t\v";
    template<>
    constexpr string_view<char8_t> space_chars<char8_t> = u8" \f\n\r\t\v";
    template<>
    constexpr string_view<char16_t> space_chars<char16_t> = u" \f\n\r\t\v";
    template<>
    constexpr string_view<char32_t> space_chars<char32_t> = U" \f\n\r\t\v";
    
    
    // list of all potential line endings that could be encountered
    template<class char_type>
    constexpr string_view<char_type> potential_line_endings[] = std::declval<string_view<char_type>[]>();
    template<>
    constexpr string_view<char> potential_line_endings<char>[] = {
        "\r\n",
        "\r",
        "\n"
    };
    template<>
    constexpr string_view<wchar_t> potential_line_endings<wchar_t>[] = {
        L"\r\n",
        L"\r",
        L"\n"
    };
    template<>
    constexpr string_view<char8_t> potential_line_endings<char8_t>[] = {
        u8"\r\n",
        u8"\r",
        u8"\n"
    };
    template<>
    constexpr string_view<char16_t> potential_line_endings<char16_t>[] = {
        u"\r\n",
        u"\r",
        u"\n"
    };
    template<>
    constexpr string_view<char32_t> potential_line_endings<char32_t>[] = {
        U"\r\n",
        U"\r",
        U"\n"
    };

    // null-terminator for the different character types
    template<class char_type>
    constexpr char_type null_char = std::declval<char_type>();
    template<>
    constexpr char null_char<char> = '\0';
    template<>
    constexpr wchar_t null_char<wchar_t> = L'\0';
    template<>
    constexpr char8_t null_char<char8_t> = u8'\0';
    template<>
    constexpr char16_t null_char<char16_t> = u'\0';
    template<>
    constexpr char32_t null_char<char32_t> = U'\0';

    // detects the line ending used within a string.
    // e.g. detect_line_ending("foo\nbar\nbaz") -> "\n"
    template<class char_type>
    consteval string_view<char_type> detect_line_ending(string_view<char_type> str) {
        return *std::ranges::max_element(
            potential_line_endings<char_type>,
            {},
            [str](string_view<char_type> line_ending) {
                // count the number of lines we would get with line_ending
                auto view = std::views::split(str, line_ending);
                return std::ranges::distance(view);
            }
        );
    }

    // returns a view to the leading sequence of space characters within a string
    // e.g. get_leading_space_sequence(" \t  foo") -> " \t  "
    template<class char_type>
    consteval string_view<char_type> get_leading_space_sequence(string_view<char_type> line) {
        return line.substr(0, line.find_first_not_of(space_chars<char_type>));
    }

    // checks if a line consists purely out of space characters
    // e.g. is_line_empty("    \t") -> true
    //      is_line_empty("   foo") -> false
    template<class char_type>
    consteval bool is_line_empty(string_view<char_type> line) {
        return get_leading_space_sequence(line).size() == line.size();
    }

    // splits a string into individual lines
    // and removes the first & last line if they are empty
    // e.g. split_lines("\na\nb\nc\n", "\n") -> {"a", "b", "c"}
    template<class char_type>
    consteval std::vector<string_view<char_type>> split_lines(
        string_view<char_type> str,
        string_view<char_type> line_ending
    ) {
        std::vector<string_view<char_type>> lines;

        for (auto line : std::views::split(str, line_ending)) {
            lines.emplace_back(line.begin(), line.end());
        }

        // remove first/last lines in case they are completely empty
        if(lines.size() > 1 && is_line_empty(lines[0])) {
            lines.erase(lines.begin());
        }
        if(lines.size() > 1 && is_line_empty(lines[lines.size()-1])) {
            lines.erase(lines.end()-1);
        }

        return lines;
    }

    // determines the longest possible sequence of space characters
    // that we can remove from each line.
    // e.g. determine_common_space_prefix_sequence({" \ta", " foo", " \t\ŧbar"}) -> " "
    template<class char_type>
    consteval string_view<char_type> determine_common_space_prefix_sequence(
        std::vector<string_view<char_type>> const& lines
    ) {
        std::vector<string_view<char_type>> space_sequences = {
            string_view<char_type>{} // empty string
        };

        for(string_view<char_type> line : lines) {
            string_view<char_type> spaces = get_leading_space_sequence(line);
            for(std::size_t len = 1; len <= spaces.size(); len++) {
                space_sequences.emplace_back(spaces.substr(0, len));
            }
   
            // remove duplicates
            std::ranges::sort(space_sequences);
            auto [first, last] = std::ranges::unique(space_sequences);
            space_sequences.erase(first, last);
        }

        // only consider space prefix sequences that apply to all lines
        auto shared_prefixes = std::views::filter(
            space_sequences,
            [&lines](string_view<char_type> prefix) {
                return std::ranges::all_of(
                    lines,
                    [&prefix](string_view<char_type> line) {
                        return line.starts_with(prefix);
                    }
                );
            }
        );

        // select the longest possible space prefix sequence
        return *std::ranges::max_element(
            shared_prefixes,
            {},
            &string_view<char_type>::size
        );
    }

    // unindents the individual lines of a raw string literal
    // e.g. unindent_string("  \n  a\n  b\n  c\n") -> "a\nb\nc"
    template<class char_type>
    consteval std::vector<char_type> unindent_string(string_view<char_type> str) {
        string_view<char_type> line_ending = detect_line_ending(str);
        std::vector<string_view<char_type>> lines = split_lines(str, line_ending);
        string_view<char_type> common_space_sequence = determine_common_space_prefix_sequence(lines);

        std::vector<char_type> new_string;
        bool is_first = true;
        for(auto line : lines) {
            // append newline
            if(is_first) {
                is_first = false;
            } else {
                new_string.insert(new_string.end(), line_ending.begin(), line_ending.end());
            }

            // append unindented line
            auto unindented = line.substr(common_space_sequence.size());
            new_string.insert(new_string.end(), unindented.begin(), unindented.end());
        }

        // add null terminator
        new_string.push_back(null_char<char_type>);

        return new_string;
    }

    // returns the size required for the unindented string
    template<class char_type>
    consteval std::size_t unindent_string_size(string_view<char_type> str) {
        return unindent_string(str).size();
    }

    // simple type that stores a raw string
    // we need this to get around the limitation that string literals
    // are not considered valid non-type template arguments.
    template<class _char_type, std::size_t size>
    struct string_wrapper {
        using char_type = _char_type;

        consteval string_wrapper(const char_type (&arr)[size]) {
            std::ranges::copy(arr, str);
        }

        char_type str[size];
    };

    // used for sneakily creating and storing
    // the unindented string in a template parameter.
    template<string_wrapper sw>
    struct unindented_string_wrapper {
        using char_type = typename decltype(sw)::char_type;
        static constexpr std::size_t buffer_size = unindent_string_size<char_type>(sw.str);
        using array_ref = const char_type (&)[buffer_size];

        consteval unindented_string_wrapper(int) {
            auto newstr = unindent_string<char_type>(sw.str);
            std::ranges::copy(newstr, buffer);
        }

        consteval array_ref get() const {
            return buffer;
        }

        char_type buffer[buffer_size];
    };

    // uses a defaulted template argument that depends on the str
    // to initialize the unindented string within a template parameter.
    // this enables us to return a reference to the unindented string.
    template<string_wrapper str, unindented_string_wrapper<str> unindented = 0>
    consteval decltype(auto) do_unindent() {
        return unindented.get();
    }

    // the actual user-defined string literal operator
    template<string_wrapper str>
    consteval decltype(auto) operator"" _M() {
        return do_unindent<str>();
    }
}

using multiline_raw_string::operator"" _M;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM