在C / C ++中将字符串格式化为特定长度的多行

Question

Is there a common C/C++ library (or common technique) for taking a line(s) of input text and splitting the words into separate lines. 是否有一个通用的C / C ++库（或常用技术）用于获取输入文本的行并将单词拆分为单独的行。 Where each line of output has a max width and words are not split across lines. 每行输出具有最大宽度，而单词不跨行分割。 Whitespace being collapsed or preserved is ok. 崩溃或保留的空白是可以的。 Punctuation must be preserved. 必须保留标点符号。 Small and compact library is preferred. 小而紧凑的图书馆是首选。

I could easily spend an afternoon putting something together that works, but would like to know if there is something common out there so I don't re-invent the wheel. 我可以轻松度过一个下午把一些东西放在一起起作用，但是想知道是否有一些常见的东西，所以我不重新发明轮子。 Bonus points if the input line can contain a format specifier to indicate an indention level for the output lines. 如果输入行可以包含格式说明符以指示输出行的缩进级别，则加分。

Example input: "Shankle drumstick corned beef, chuck turkey chicken pork chop venison beef strip steak cow sausage. Tail short loin shoulder ball tip, jowl drumstick rump. Tail tongue ball tip meatloaf, bresaola short loin tri-tip fatback pork loin sirloin shank flank biltong. Venison short loin andouille. 示例输入：“Shankle鼓腌牛肉，chuck火鸡鸡肉猪肉牛排牛排牛排香肠。尾巴短肩球尖，下巴鸡腿臀部。尾舌球尖肉饼，bresaola短腰三头肥肉猪腰里脊牛腿biltong。鹿肉短腰里ou。

Example output (target width = 60) 示例输出（目标宽度= 60）

123456789012345678901234567890123456789012345678901234567890   Line added to show where 60 is
Shankle drumstick corned beef, chuck turkey chicken pork
chop venison beef strip steak cow sausage. Tail short loin
shoulder ball tip, jowl drumstick rump. Tail tongue ball tip
meatloaf, bresaola short loin tri-tip fatback pork loin
sirloin shank flank biltong. Venison short loin andouille.

Answer 1

Here is a small function with which you can do what you want. 这是一个小功能，您可以用它来做你想要的。 It returns a list of the lines. 它返回一个行list 。 You can remove all of the std:: if you want by using namespace std; 如果需要，可以using namespace std;删除所有std:: using namespace std; or better using std::list; using std::string; using std::size_t; 或者更好地using std::list; using std::string; using std::size_t; using std::list; using std::string; using std::size_t; but I didn't want to assume you did. 但我不想假设你做到了。

list<string> wraptext(string input, size_t width) {
    size_t curpos = 0;
    size_t nextpos = 0;

    list<string> lines;
    string substr = input.substr(curpos, width + 1);

    while (substr.length() == width + 1 && (nextpos = substr.rfind(' ')) != input.npos) {
        lines.push_back(input.substr(curpos, nextpos));
        curpos += nextpos + 1;
        substr = input.substr(curpos, width + 1);
    }

    if (curpos != input.length())
        lines.push_back(input.substr(curpos, input.npos));

    return lines;
}

This program using that function: 这个程序使用该功能：

int main() {
    string input = "Shankle drumstick corned beef, chuck turkey chicken pork chop venison beef strip steak cow sausage. Tail short loin shoulder ball tip, jowl drumstick rump. Tail tongue ball tip meatloaf, bresaola short loin tri-tip fatback pork loin sirloin shank flank biltong. Venison short loin andouille.";

    list<string> l = wraptext(input, 60);

    for (auto i = l.begin(); i != l.end(); ++i)
        cout << *i << endl;

    cin.get();
}

Prints your example text: 打印示例文本：

Shankle drumstick corned beef, chuck turkey chicken pork
chop venison beef strip steak cow sausage. Tail short loin
shoulder ball tip, jowl drumstick rump. Tail tongue ball tip
meatloaf, bresaola short loin tri-tip fatback pork loin
sirloin shank flank biltong. Venison short loin andouille.

Answer 2

I think what you may be looking for is: 我想你可能正在寻找的是：

char temp[60];
int cnt, x = 0;
do
{
    cnt = 59;
    strncpy(temp, src + x, 60); //Assuming the original is stored in src
    while(temp[cnt] != ' ') cnt --;
    temp[cnt] = (char) 0;
    x += cnt + 1;
    printf("%s\n", temp);
}while (x < strlen(src));

Answer 3

If you want to do the job in C, you could try the w_wrap.c and w_wrap.h that I posted to Fidonet C_ECHO 20 years ago or so. 如果你想在C中完成这项工作，你可以尝试20年前我发布到Fidonet C_ECHO的w_wrap.c和w_wrap.h 。

If you want to do the job in C++, it seems like you could simplify the code a bit: 如果你想在C ++中完成这项工作，似乎你可以简化代码：

#include <sstream>
#include <string>
#include <iostream>

void wrap(std::string const &input, size_t width, std::ostream &os, size_t indent = 0)
{ 
    std::istringstream in(input);

    os << std::string(indent, ' '); 
    size_t current = indent;
    std::string word;

    while (in >> word) {
        if (current + word.size() > width) {
            os << "\n" << std::string(indent, ' ');
            current = indent;
        }
        os << word << ' ';
        current += word.size() + 1;
    }
}

#ifdef TEST 
int main() { 
    char *in = "Shankle drumstick corned beef, chuck turkey chicken pork chop"
               " venison beef strip steak cow sausage. Tail short loin shoulder"
               " ball tip, jowl drumstick rump. Tail tongue ball tip meatloaf,"
               " bresaola short loin tri-tip fatback pork loin sirloin shank"
               " flank biltong. Venison short loin andouille.";

    wrap(in, 60, std::cout);
    return 0;
}
#endif

To add indentation, you'd use something like: 要添加缩进，您可以使用以下内容：

wrap(in, 60, std::cout, 5);

Given that you're doing I/O, it probably doesn't matter much in this case, but if you were doing this under other circumstances, you might want to consider a different algorithm. 鉴于您正在进行I / O，在这种情况下可能并不重要，但如果您在其他情况下这样做，您可能需要考虑不同的算法。 Rather than copy one word at a time until you exceed the specified width, you can go directly to the maximum line width in the input, and walk backwards through the input string from there until you find whitespace. 您可以直接转到输入中的最大线宽，然后从那里向后走过输入字符串，直到找到空格，而不是一次复制一个单词，直到超过指定的宽度。 At least given typical word lengths, you'll only walk back somewhere around 3 characters on average, rather than walking forward through an average of (say) 60 characters. 至少给出典型的单词长度，你只会平均回到大约3个字符的某个位置，而不是向前走平均（例如）60个字符。 This would be particularly relevant using something like C strings, where you were storing a pointer to the beginning of each line, without copying the content. 这与使用类似C字符串的东西特别相关，在这些字符串中，您存储指向每行开头的指针，而不复制内容。

Answer 4

Here is a regex-based approach. 这是一种基于正则表达式的方法。 Different from the approaches in other answers, it also handles newlines in the input string gracefully. 与其他答案中的方法不同，它还可以优雅地处理输入字符串中的换行符。

#include <regex>
#include <iostream>
#include <string>

int main() {
  auto test = std::string{"Shankle drumstick corned beef, chuck turkey chicken pork chop venison beef strip steak cow sausage. Tail short loin shoulder ball tip, jowl drumstick rump. Tail tongue ball tip meatloaf, bresaola short loin tri-tip fatback pork loin sirloin shank flank biltong. Venison short loin andouille."};

  // Consume 60 characters that are followed by a space or the end of the input string
  auto line_wrap = std::regex{"(.{1,60})(?: +|$)"};

  // Replace the space or the end of the input string with a new line
  test = regex_replace(test, line_wrap, "$1\n");

  // Trim the new line added for the end of the input string
  test.resize(test.size() - 1);

  std::cout << test << std::endl;
}

Answer 5

Ya，将其加载到一个字符数组中，然后使用strtok将其分解为单词，使用空格作为单词分隔符。

Answer 6

take a function for your work like: 为你的工作服务，如：

void put_multiline(const char *s,int width)
{
  int n,i=0;
  char t[100];
  while( 1==sscanf(s,"%99s%n",t,&n) )
  {
    if( i+strlen(t)>width ) puts(""),i=0;
    printf("%s%s",i?++i," ":"",t);i+=strlen(t);
    s+=n;
  }
}

strtok will destroy your string, this solution not. strtok将破坏你的字符串，这个解决方案没有。 This function will also work on all whitespaces not only space/tab. 此函数也适用于所有空格，而不仅仅是空格/制表符。

Answer 7

Here's my approach, it's certainly not the fastest but I tried to make it as readable as possible. 这是我的方法，它肯定不是最快的，但我试图让它尽可能可读。 The result is the same as your example. 结果与您的示例相同。

#include <iostream>
#include <string>


std::string splitInLines(std::string source, std::size_t width, std::string whitespace = " \t\r")
{
    std::size_t  currIndex = width - 1;
    std::size_t  sizeToElim;
    while ( currIndex < source.length() )
    {
        currIndex = source.find_last_of(whitespace,currIndex + 1); 
        if (currIndex == std::string::npos)
            break;
        currIndex = source.find_last_not_of(whitespace,currIndex);
        if (currIndex == std::string::npos)
            break;
        sizeToElim = source.find_first_not_of(whitespace,currIndex + 1) - currIndex - 1;
        source.replace( currIndex + 1, sizeToElim , "\n");
        currIndex += (width + 1); //due to the recently inserted "\n"
    }
    return source;
}

int main() {
    std::string source = "Shankle drumstick corned beef, chuck turkey chicken pork chop venison beef strip steak cow sausage. Tail short loin shoulder ball tip, jowl drumstick rump. Tail tongue ball tip meatloaf, bresaola short loin tri-tip fatback pork loin sirloin shank flank biltong. Venison short loin andouille.";
    std::string result = splitInLines(source , 60);
    std::cout << result;
    return 0;
}

Answer 8

You could probably use regex substitution: replace /(.*){,60}? +/ 您可以使用正则表达式替换：替换/(.*){,60}? +/ /(.*){,60}? +/ with $1\\n , advance the string pointer and repeat (note: the ? is supposed to mean non-greedy matching). /(.*){,60}? +/ with $1\\n ，前进字符串指针并重复（注意： ?应该表示非贪婪的匹配）。

If properly implemented, the conversion could be even made in-place. 如果实施得当，转换甚至可以就地进行。

在C / C ++中将字符串格式化为特定长度的多行

问题描述

8 个解决方案

解决方案1
1 2011-07-31 19:31:42

解决方案2
1 2011-07-31 19:35:10

解决方案3
1 2011-08-01 06:19:15

解决方案4
0 2019-01-13 10:34:08

解决方案5
0 2011-07-31 19:17:22

解决方案6
0 2011-07-31 21:25:18

解决方案7
0 已采纳 2011-07-31 21:52:15

解决方案8
0 2011-08-01 06:28:46

在C / C ++中将字符串格式化为特定长度的多行

问题描述

8 个解决方案

解决方案1 1 2011-07-31 19:31:42

解决方案2 1 2011-07-31 19:35:10

解决方案3 1 2011-08-01 06:19:15

解决方案4 0 2019-01-13 10:34:08

解决方案5 0 2011-07-31 19:17:22

解决方案6 0 2011-07-31 21:25:18

解决方案7 0 已采纳 2011-07-31 21:52:15

解决方案8 0 2011-08-01 06:28:46

解决方案1
1 2011-07-31 19:31:42

解决方案2
1 2011-07-31 19:35:10

解决方案3
1 2011-08-01 06:19:15

解决方案4
0 2019-01-13 10:34:08

解决方案5
0 2011-07-31 19:17:22

解决方案6
0 2011-07-31 21:25:18

解决方案7
0 已采纳 2011-07-31 21:52:15

解决方案8
0 2011-08-01 06:28:46