简体   繁体   中英

Formatting a string into multiple lines of a specific length in C/C++

Is there a common C/C++ library (or common technique) for taking a line(s) of input text and splitting the words into separate lines. Where each line of output has a max width and words are not split across lines. Whitespace being collapsed or preserved is ok. Punctuation must be preserved. Small and compact library is preferred.

I could easily spend an afternoon putting something together that works, but would like to know if there is something common out there so I don't re-invent the wheel. Bonus points if the input line can contain a format specifier to indicate an indention level for the output lines.

Example input: "Shankle drumstick corned beef, chuck turkey chicken pork chop venison beef strip steak cow sausage. Tail short loin shoulder ball tip, jowl drumstick rump. Tail tongue ball tip meatloaf, bresaola short loin tri-tip fatback pork loin sirloin shank flank biltong. Venison short loin andouille.

Example output (target width = 60)

123456789012345678901234567890123456789012345678901234567890   Line added to show where 60 is
Shankle drumstick corned beef, chuck turkey chicken pork
chop venison beef strip steak cow sausage. Tail short loin
shoulder ball tip, jowl drumstick rump. Tail tongue ball tip
meatloaf, bresaola short loin tri-tip fatback pork loin
sirloin shank flank biltong. Venison short loin andouille.

Here is a small function with which you can do what you want. It returns a list of the lines. You can remove all of the std:: if you want by using namespace std; or better using std::list; using std::string; using std::size_t; using std::list; using std::string; using std::size_t; but I didn't want to assume you did.

list<string> wraptext(string input, size_t width) {
    size_t curpos = 0;
    size_t nextpos = 0;

    list<string> lines;
    string substr = input.substr(curpos, width + 1);

    while (substr.length() == width + 1 && (nextpos = substr.rfind(' ')) != input.npos) {
        lines.push_back(input.substr(curpos, nextpos));
        curpos += nextpos + 1;
        substr = input.substr(curpos, width + 1);
    }

    if (curpos != input.length())
        lines.push_back(input.substr(curpos, input.npos));

    return lines;
}

This program using that function:

int main() {
    string input = "Shankle drumstick corned beef, chuck turkey chicken pork chop venison beef strip steak cow sausage. Tail short loin shoulder ball tip, jowl drumstick rump. Tail tongue ball tip meatloaf, bresaola short loin tri-tip fatback pork loin sirloin shank flank biltong. Venison short loin andouille.";

    list<string> l = wraptext(input, 60);

    for (auto i = l.begin(); i != l.end(); ++i)
        cout << *i << endl;

    cin.get();
}

Prints your example text:

Shankle drumstick corned beef, chuck turkey chicken pork
chop venison beef strip steak cow sausage. Tail short loin
shoulder ball tip, jowl drumstick rump. Tail tongue ball tip
meatloaf, bresaola short loin tri-tip fatback pork loin
sirloin shank flank biltong. Venison short loin andouille.

I think what you may be looking for is:

char temp[60];
int cnt, x = 0;
do
{
    cnt = 59;
    strncpy(temp, src + x, 60); //Assuming the original is stored in src
    while(temp[cnt] != ' ') cnt --;
    temp[cnt] = (char) 0;
    x += cnt + 1;
    printf("%s\n", temp);
}while (x < strlen(src));

If you want to do the job in C, you could try the w_wrap.c and w_wrap.h that I posted to Fidonet C_ECHO 20 years ago or so.

If you want to do the job in C++, it seems like you could simplify the code a bit:

#include <sstream>
#include <string>
#include <iostream>

void wrap(std::string const &input, size_t width, std::ostream &os, size_t indent = 0)
{ 
    std::istringstream in(input);

    os << std::string(indent, ' '); 
    size_t current = indent;
    std::string word;

    while (in >> word) {
        if (current + word.size() > width) {
            os << "\n" << std::string(indent, ' ');
            current = indent;
        }
        os << word << ' ';
        current += word.size() + 1;
    }
}

#ifdef TEST 
int main() { 
    char *in = "Shankle drumstick corned beef, chuck turkey chicken pork chop"
               " venison beef strip steak cow sausage. Tail short loin shoulder"
               " ball tip, jowl drumstick rump. Tail tongue ball tip meatloaf,"
               " bresaola short loin tri-tip fatback pork loin sirloin shank"
               " flank biltong. Venison short loin andouille.";

    wrap(in, 60, std::cout);
    return 0;
}
#endif

To add indentation, you'd use something like:

wrap(in, 60, std::cout, 5);

Given that you're doing I/O, it probably doesn't matter much in this case, but if you were doing this under other circumstances, you might want to consider a different algorithm. Rather than copy one word at a time until you exceed the specified width, you can go directly to the maximum line width in the input, and walk backwards through the input string from there until you find whitespace. At least given typical word lengths, you'll only walk back somewhere around 3 characters on average, rather than walking forward through an average of (say) 60 characters. This would be particularly relevant using something like C strings, where you were storing a pointer to the beginning of each line, without copying the content.

Here is a regex-based approach. Different from the approaches in other answers, it also handles newlines in the input string gracefully.

#include <regex>
#include <iostream>
#include <string>

int main() {
  auto test = std::string{"Shankle drumstick corned beef, chuck turkey chicken pork chop venison beef strip steak cow sausage. Tail short loin shoulder ball tip, jowl drumstick rump. Tail tongue ball tip meatloaf, bresaola short loin tri-tip fatback pork loin sirloin shank flank biltong. Venison short loin andouille."};

  // Consume 60 characters that are followed by a space or the end of the input string
  auto line_wrap = std::regex{"(.{1,60})(?: +|$)"};

  // Replace the space or the end of the input string with a new line
  test = regex_replace(test, line_wrap, "$1\n");

  // Trim the new line added for the end of the input string
  test.resize(test.size() - 1);

  std::cout << test << std::endl;
}

Ya,将其加载到一个字符数组中,然后使用strtok将其分解为单词,使用空格作为单词分隔符。

take a function for your work like:

void put_multiline(const char *s,int width)
{
  int n,i=0;
  char t[100];
  while( 1==sscanf(s,"%99s%n",t,&n) )
  {
    if( i+strlen(t)>width ) puts(""),i=0;
    printf("%s%s",i?++i," ":"",t);i+=strlen(t);
    s+=n;
  }
}

strtok will destroy your string, this solution not. This function will also work on all whitespaces not only space/tab.

Here's my approach, it's certainly not the fastest but I tried to make it as readable as possible. The result is the same as your example.

#include <iostream>
#include <string>


std::string splitInLines(std::string source, std::size_t width, std::string whitespace = " \t\r")
{
    std::size_t  currIndex = width - 1;
    std::size_t  sizeToElim;
    while ( currIndex < source.length() )
    {
        currIndex = source.find_last_of(whitespace,currIndex + 1); 
        if (currIndex == std::string::npos)
            break;
        currIndex = source.find_last_not_of(whitespace,currIndex);
        if (currIndex == std::string::npos)
            break;
        sizeToElim = source.find_first_not_of(whitespace,currIndex + 1) - currIndex - 1;
        source.replace( currIndex + 1, sizeToElim , "\n");
        currIndex += (width + 1); //due to the recently inserted "\n"
    }
    return source;
}

int main() {
    std::string source = "Shankle drumstick corned beef, chuck turkey chicken pork chop venison beef strip steak cow sausage. Tail short loin shoulder ball tip, jowl drumstick rump. Tail tongue ball tip meatloaf, bresaola short loin tri-tip fatback pork loin sirloin shank flank biltong. Venison short loin andouille.";
    std::string result = splitInLines(source , 60);
    std::cout << result;
    return 0;
}

You could probably use regex substitution: replace /(.*){,60}? +/ /(.*){,60}? +/ with $1\\n , advance the string pointer and repeat (note: the ? is supposed to mean non-greedy matching).

If properly implemented, the conversion could be even made in-place.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM