简体   繁体   中英

Splitting long string without breaking words fulfilling lines

Before you think that it's duplicated (there are many question asking how to split long strings without breaking words) take in mind that my problem is a bit different: order is not important and I've to fit the words in order to use every line as much as possible.

I've a unordered set of words and I want to combine them without using more than 253 characters.

def compose(words):
    result = " ".join(words)
    if len(result) > 253:
        pass # this should not happen!
    return result

My problem is that I want to try to fill the line as much as possible. For example:

words = "a bc def ghil mno pq r st uv"
limit = 5 # max 5 characters

# This is good because it's the shortest possible list,
#   but I don't know how could I get it
# Note: order is not important
good = ["a def", "bc pq", "ghil", "mno r", "st uv"]

# This is bad because len(bad) > len(good)
#   even if the limit of 5 characters is respected
# This is equivalent to:
#   bad  = ["a bc", "def", "ghil", "mno", "pq r", "st uv"]
import textwrap
bad = textwrap.wrap(words, limit)

How could I do?

This is the bin packing problem ; the solution is NP-hard, although there exist non-optimal heuristic algorithms, principally first fit decreasing and best fit decreasing. See https://github.com/type/Bin-Packing for implementations.

Non-optimal offline fast 1D bin packing Python algorithm

def binPackingFast(words, limit, sep=" "):
    if max(map(len, words)) > limit:
        raise ValueError("limit is too small")
    words.sort(key=len, reverse=True)
    res, part, others = [], words[0], words[1:]
    for word in others:
        if len(sep)+len(word) > limit-len(part):
            res.append(part)
            part = word
        else:
            part += sep+word
    if part:
        res.append(part)
    return res

Performance

Tested over /usr/share/dict/words (provided by words-3.0-20.fc18.noarch ) it can do half million words in a second on my slow dual core laptop, with an efficiency of at least 90% with those parameters:

limit = max(map(len, words))
sep = ""

With limit *= 1.5 I get 92%, with limit *= 2 I get 96% (same execution time).

Optimal (theoretical) value is calculated with: math.ceil(len(sep.join(words))/limit)

no efficient bin-packing algorithm can be guaranteed to do better

Source: http://mathworld.wolfram.com/Bin-PackingProblem.html

Moral of the story

While it's interesting to find the best solution, I think that for the most cases it would be much better to use this algorithm for 1D offline bin packing problems.

Resources

Notes

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM