简体   繁体   English

尝试在多条固定长度的线上打印单个字符串并最大限度地降低成本

[英]Trying to print a single string on multiple lines of fixed length and minimizing the cost

First some background I just started with Algorithms(which I now feel I lack the logic and reasoning power to excel at)I have been trying to print "This is a sample text "into various lines with max of 7 chars on each line so the first line will have : 首先是我刚开始使用算法的一些背景(我现在觉得我缺乏优秀的逻辑和推理能力)我一直试图将“这是一个示例文本”打印成各行,每行最多7个字符,所以第一行将有:

this is  (no spaces left in the end so cost 0)
a  
[cost=6*6*6(The spaces left at the end of each line are cubed which will be the cost) ]
sample [cost=1*1*1]
text [cost= 3*3*3]

(Total cost = 0+216+1+27=244)

Now this can be optimized by 现在这可以通过优化

this [cost 3*3*3]
is a [cost 3*3*3]
sample [cost 1*1*1]
text [cost 3*3*3]

[Total cost = 27+27+1+27 = 82]

So clearly we cannot use a greedy approach here instead use dynamic programming but my problem is I cannot figure out the sub structure that will be reused. 很明显我们不能在这里使用贪婪的方法而是使用动态编程但我的问题是我无法弄清楚将被重用的子结构。 I am really stuck with figuring out how I link the cost condition to the printing in python, I can index each word and I can get the length of each word, sort of stuck with what I do next When print all that happens is the entire string gets printed on one line each (This is where I have got so far). 我真的很想弄清楚如何将成本条件与python中的打印联系起来,我可以为每个单词编制索引,我可以得到每个单词的长度,有点像我接下来做的那样当打印所有发生的事情就是整个字符串每行打印一行(这是我到目前为止的地方)。 I apologize if this is a really silly question, but I am stuck and really need some help on this. 如果这是一个非常愚蠢的问题我很抱歉,但我很困难,真的需要一些帮助。 Thanks 谢谢


This is how I have tried implementing the code although I tried running some tests on the code, the test were written by my friend and I dont think I am getting it right Any help or suggestion is appreciated print_test.py 这就是我尝试实现代码的方法,虽然我尝试对代码运行一些测试,测试是由我的朋友写的,我不认为我做对了任何帮助或建议表示赞赏print_test.py

 import os
 import sys
 from glob import glob

  #TODO -- replace this with your solution 
 from printing import print_neatly

 log = open('output.log', 'w')

 #This tests the code against my own text
 maxline = 80
 for source in glob('*.txt'):
 with open(source) as f:
    fulltext = f.read()

 words = fulltext.split()
 (cost, text) = print_neatly(words, maxline)

 #double check the cost
 #lines = text.split('\n')
 truecost = 0
 for line in text[0:-1]:
    truecost += (maxline - len(line))**3


   #print the output and cost
   print >>log, '----------------------'
   print >>log, source
   print >>log, '----------------------'
   print >>log, text
   print >>log, '----------------------'
   print >>log, 'cost = ', cost
   print >>log, 'true cost = ', truecost
   print >>log, '----------------------'


log.close()

#print the log
with open('output.log') as f: print f.read()

printing.py printing.py

def print_neatly(wordlist, max):
   #strings='This is a sample string'

   #splitting the string and taking out words from it 
   #wordlist=strings.split()
   (cost, dyn_print) = print_line(wordlist, len(wordlist), max)
   for dyn in dyn_print:
      print dyn
   return cost, dyn_print

 def cost(lines, max):

    return sum([(max-len(x)) ** 3 for x in lines])

 def print_line(wordlist, count, max, results = {}):
  results = [([],0)]
  for count in range(1, len(wordlist) + 1):
    best = wordlist[:count]               
    best_cost = cost(best, max)
    mycount = count - 1
    line = wordlist[mycount]       
    while len(line) <= max: 
        attempt, attempt_cost = results[mycount]
        attempt = attempt + [line]
        attempt_cost += cost([line],max)
        if attempt_cost < best_cost:
            best = attempt
            best_cost = attempt_cost
        if mycount > 0:
            mycount -= 1
            line = wordlist[mycount] + ' ' + line
        else:
            break
    results += [(best, best_cost)]

 #print best
 #print best_cost
 return (best_cost, best)


#print_neatly(0,7)

The text files that need to be tested give me this output, here the two cost need to be the same which I am not getting, can some one point out where I am going wrong 需要测试的文本文件给我这个输出,这里两个成本需要相同,我没有得到,可以有人指出我出错的地方


cost = 16036 成本= 16036

true cost = 15911 真实成本= 15911

Once approach is to list all possible alternatives and pick the one with the minimum cost: 一旦接近列出所有可能的替代方案并选择最低成本的方案:

from functools import wraps

def cache(origfunc):
    d = {}
    @wraps(origfunc)
    def wrapper(*args):
        if args in d:
            return d[args]
        result = origfunc(*args)
        d[args] = result
        return result
    return wrapper

@cache
def alternatives(t, m=7):
    ''' Given a tuple of word lengths and a maximum line length,
        return a list of all possible line groupings
        showing the total length of each line.

        >>> alternatives((4, 2, 1, 3), 7)
        [[4, 2, 1, 3], [4, 2, 5], [4, 4, 3], [7, 1, 3], [7, 5]]

    '''
    if not t:
        return []
    alts = []
    s = 0
    for i, x in enumerate(t):
        s += x
        if s > m:
            break
        tail = t[i+1:]
        if not tail:
            alts.append([s])
            break
        for subalt in alternatives(tail, m):
            alts.append([s] + subalt)
        s += 1
    return alts

def cost(t, m=7):
    ''' Evaluate the cost of lines given to line lengths

            >>> cost((7, 1, 6, 4), m=7)  # 'this is', 'a', 'sample', 'text'
            244
            >>> cost((4, 4, 6, 4))       # 'this', 'is a', 'sample', 'text'
            82

    '''
    return sum((m - x) ** 3 for x in t)

def textwrap(s, m=7):
    ''' Given a string, result a list of strings with optimal line wrapping

        >>> print textwrap('This is a sample text', 7)
        ['This', 'is a', 'sample', 'text']

    '''
    words = s.split()
    t = tuple(map(len, words))
    lengths = min(alternatives(t, m), key=cost)
    result = []
    worditer = iter(words)
    for length in lengths:
        line = []
        s = 0
        while s < length:
            word = next(worditer)
            line.append(word)
            s += len(word) + 1
        result.append(' '.join(line))
    return result


if __name__ == '__main__':
    import doctest
    print doctest.testmod()

The code can be sped-up by limiting the number of alternatives searches (perhaps limited to the three longest alternatives on each line). 通过限制备选搜索的数量(可能限于每行上三个最长的备选方案),可以加快代码的速度。

If there's a "best" way to arrange one word, two words, etc into lines, that's not going to change based on what lines come later. 如果有一种“最佳”的方式将一个单词,两个单词等排列成行,那么根据后来的行将不会改变。 It can change based on what words come later, if the words are small enough to join others on a line. 它可以改变基于稍后什么来,如果字是小到足以同其他国家一起在一条线上。 But if we take those words in isolation and try to arrange them into lines, the same set of solutions will always be optimal. 但是,如果我们孤立地采用这些词并尝试将它们排成一行,那么同一组解决方案将始终是最佳的。 (There may be equivalent answers; for example, given the criteria, "cats in hats" on 7-char lines has two solutions. Both are "best", and always will be -- and we can decide on either one and stick with it without sacrificing correctness.) (可能有相同的答案;例如,根据标准,7-char线上的“帽子里的猫”有两个解决方案。两者都是“最好的”,而且总是会 - 并且我们可以决定任何一个并坚持使用它没有牺牲正确性。)

  • "This" will always be best as ["This"] . "This"总是最好的["This"] (Note, i'm not saying it will always be best on a line by itself! What i am saying is that if you have the one word, the single best way to arrange it is on one line.) (注意,我并不是说它本身总是最好的!我所说的是,如果你有一个单词,安排它的唯一最佳方法是在一行。)

  • "This is" can be arranged as ["This", "is"] or as ["This is"] . "This is"可以安排为["This", "is"]["This is"] The latter, however, is best. 然而,后者是最好的。 So from here on, whenever we only have these two words to consider, we can ignore ["This", "is"] entirely -- it will never be superior. 所以从这里开始,每当我们只考虑这两个词时, 我们就可以完全忽略[“这个”,“是”] - 它永远不会优越。

  • "This is a" can be arranged as ["This", "is", "a"] , ["This is", "a"] , or ["This", "is a"] . "This is a"可以安排为 ["This", "is", "a"] ["This is", "a"]["This", "is a"] (We already know that ["This is"] is superior to ["This", "is"] -- see the previous bullet point!) Turns out ["This", "is a"] is best. (我们已经知道["This is"]优于["This", "is"] - 参见上一个要点!)结果["This", "is a"]是最好的。 So we can ignore ["This is", "a"] from here on. 所以我们可以从这里忽略[“这是”,“a”]。

  • "This is a sample" can be arranged as: "This is a sample"可以安排为:

    • ["This", "is", "a", "sample"] (See bullet #2 -- we don't even have to look at this) ["This", "is", "a", "sample"] (参见子弹#2 - 我们甚至不必看这个)
    • ["This is", "a", "sample"] (See bullet #3) ["This is", "a", "sample"] (见子弹#3)
    • ["This", "is a", "sample"]

I don't know Python; 我不懂Python; i just hacked this together. 我只是一起砍掉了这个。 So forgive me if it's "un-Pythonic" or whatever. 如果它是“非Pythonic”或其他什么,请原谅我。 :P :P

def cost(lines, limit):
    # figures the cost of the current arrangement of words in lines.
    return sum([(limit-len(x)) ** 3 for x in lines])


def lineify(words, limit):
    # splits up words into lines of at most (limit) chars.
    # should find an optimal solution, assuming all words are < limit chars long

    results = [([], 0)]

    for count in range(1, len(words) + 1):
        best = words[:count]         # (start off assuming one word per line)
        best_cost = cost(best, limit)
        mycount = count - 1
        line = words[mycount]        # start with one word

        while len(line) <= limit:
            # figure the optimal cost, assuming the other words are on another line
            attempt, attempt_cost = results[mycount]
            attempt = attempt + [line]
            attempt_cost += cost([line],limit)
            # print attempt
            if attempt_cost < best_cost:
                best = attempt
                best_cost = attempt_cost

            # steal another word.  if there isn't one, we're done
            if mycount > 0:
                mycount -= 1
                line = words[mycount] + ' ' + line
            else:
                break

        # once we have an optimal result for (count) words, save it for posterity
        results += [(best, best_cost)]

    return results[len(words)][0]


def wrap(phrase, limit):
    # helper function...so the caller doesn't have to pass an array of words.
    # they shouldn't need to know to do that
    words = phrase.split()
    return lineify(words, limit)

I originally had a recursive solution, but it turns out that Python places some limits on recursion that make it unsuitable when a decent size text and real-world length limit come into play. 我最初有一个递归的解决方案,但事实证明Python在递归上设置了一些限制,这使得当适当大小的文本和真实世界长度限制发挥作用时它不适合。 (You have to backtrack all the way to the beginning anyway before anything gets memoized, and if i had over like 1000 words, i ended up hitting recursion limits. This could be extended by starting with enough words to fill the last line, but it'd still limit the max to some multiple of the original limit.) I found myself using a hack to build up the results til the recursion limit was no longer an issue. (在任何事情被记忆之前,你必须一直回溯到开头,如果我有超过1000个单词,我最终会达到递归限制。这可以通过从足够的单词开始填充最后一行来扩展,但它'仍然将最大值限制为原始限制的某个倍数。)我发现自己使用黑客来建立结果,直到递归限制不再是问题。 If you have to do that, though, that's perhaps an indication that the recursion itself is an issue. 但是,如果必须这样做,那可能表明递归本身就是一个问题。

This algorithm relies on assumption that if we know optimal solution for the N-1,N-2,..,2,1 last words in the text then it is easy to construct optimal solution for N words. 该算法依赖于以下假设:如果我们知道文本中N-1,N-2,...,2,1最后一个词的最优解,那么很容易构造N个词的最优解。 Memorization allows to avoid recomputing results of best_partition() calls for the same input: 记忆允许避免重新计算同一输入的best_partition()调用的结果:

import functools

def wrap(text, width):
    """
    >>> wrap('This is a sample text', 7)
    ['This', 'is a', 'sample', 'text']
    """
    return [' '.join(line) for line in best_partition(
        tuple(text.split()), functools.partial(cost, width=width))]

def best_partition(words, cost):
    """The best partition of words into lines according to the cost function."""
    best = [words] # start with all words on a single line
    for i in reversed(range(1, len(words))): # reverse to avoid recursion limit
        lines = [words[:i]] + best_partition(words[i:], cost)
        if cost(lines) < cost(best):
            best = lines
    return best

def memoize(func):
    cache = {}
    @functools.wraps(func)
    def wrapper(*args):
        try: return cache[args]
        except KeyError:
            ret = cache[args] = func(*args)
            return ret
    return wrapper

best_partition = memoize(best_partition)

Where cost() is: cost()是:

def linelen(words):
    """Number of characters in a line created from words."""
    if not words: return 0
    # words + spaces between them
    return sum(map(len, words)) + len(words) - 1

def cost(lines, width):
    """
    - each line except last costs `(width - w)**3`, where `w` is the
      line width

    - cost is infinite if `w > width` and the line has more than one word

    >>> cost([['a'], ['b']], 1)
    0
    >>> cost([['a','b']], 1)
    inf
    >>> cost([['a'], ['b']], 3)
    8
    >>> cost([['a', 'b']], 2)
    inf
    """
    if not lines: return 0
    s = 0
    for i, words in enumerate(lines, 1):
        w = linelen(words)
        if width >= w:
            if i != len(lines): # last line has zero cost
                s += (width - w)**3
        elif len(words) != 1: # more than one word in the line
            return float("inf") # penalty for w > width
    return s

Example: 例:

print('\n'.join(wrap("""
    In olden times when wishing still helped one, there lived a king whose
    daughters were all beautiful, but the youngest was so beautiful that
    the sun itself, which has seen so much, was astonished whenever it
    shone in her face. Close by the king's castle lay a great dark forest,
    and under an old lime-tree in the forest was a well, and when the day
    was very warm, the king's child went out into the forest and sat down
    by the side of the cool fountain, and when she was bored she took a
    golden ball, and threw it up on high and caught it, and this ball was
    her favorite plaything.
    """, int(sys.argv[1]) if len(sys.argv) > 1 else 70)))

Output 产量

In olden times when wishing still helped one, there lived a king whose
daughters were all beautiful, but the youngest was so beautiful that
the sun itself, which has seen so much, was astonished whenever it
shone in her face. Close by the king's castle lay a great dark forest,
and under an old lime-tree in the forest was a well, and when the day
was very warm, the king's child went out into the forest and sat down
by the side of the cool fountain, and when she was bored she took a
golden ball, and threw it up on high and caught it, and this ball was
her favorite plaything.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM