简体   繁体   中英

setting a minimum limit on run length encoding length

def encode (plainText):
    res=''
    a=''
    for i in plainText:
        if a.count(i)>0:
           a+=i
        else:
            if len(a)>3:
                res+="/" + str(len(a)) + a[0][:1]
            else:
                res+=a
                a=i
    return(res)

this is my current code. for those of you who know about run length encoding, it can make files larger because a single value becomes two. I am trying to set a minimum length of 3, so that it would actually compress. any help with code corrections, suggestions are greatly appreciated.

This should work:

plainText = "Hellow world"

def encode (plainText):
    count = 1
    last_sym = ""
    rle = ""

    for i in plainText:

        if i == last_sym:
            count = count + 1

        elif i != last_sym:
            if count > 2:
                if count < 10:
                    n = str("0") + str(count)
                    rle = rle + n + i

                else:
                    rle = rle + str(count) + i

            else:
                rle = rle + i
            count = 1
            last_sym = i

    return rle









rle = encode(plainText)
print(rle)

You've made quite a few errors in there, here's a little list.

1  def encode (plainText):
2      res=''
3      a=''
4      for i in plainText:
5          if a.count(i)>0:
6             a+=i
7          else:
8              if len(a)>3:
9                  res+="/" + str(len(a)) + a[0][:1]
10             else:
11                 res+=a
12                 a=i
13     return(res)
  • Lines [3],[5]: You store the letters in a string and repeatedly call count. It would be easier (and faster) to simply store last char and add new variable to act as counter.
  • Lines [8][9]: Whenever you encounter a repeating character, you add the (correct) encoding string. However, you never update the a . So once you reach this line of code the first time, every next character will add the same encoding string. Solution is simple, move Line [12] one indentation over, so it assignes the new character in each of the two cases.
  • Line [13]: You output without adding the last character. The iterations encoding works on the previous character. This means the last character in the string has to be dealt with after the loop ends.
  • And last but not least, since you use / as a special character, you should deal with it somehow when it appears as a nonrepeating code. For example plaintext /12a would be encoded as /12a and then be decoded as sequence of 12 a s.

And here is some (hopefully) working sample:

def encode (plainText):
    ## Lazy solution for some of the edge cases
    if plainText is None or len(plainText) == 0:
        return plainText

    ## We can join this together
    ## and have faster arbitrary 
    ## str length addition
    res=[]
    ## We only care about the last
    ## character, no need to save all
    prev_char=''
    ## And count it ourselves, its
    ## faster then calling count
    count=0
    for i in plainText:
        if i == prev_char:
            ## If its the same char
            ## increment count
            count += 1
            ## and then continue with next
            ## cycle. Avoid unneccasary indent.
            continue

        ## else
        if count > 2 or prev_char == '/':
            ## If more then 2 occurances
            ## we build the encoding.
            ## for 3 occurances the length is the same.
            ## '/' is a special character so we
            ## always encode it
            res.append(
                f"/{count}{prev_char}"
            )
        else:
            ## Otherwise just append the symbols
            res.append(prev_char*count)
        ## We observed at least 1 instance of i 
        count = 1
        ## Store for next comparison
        prev_char = i

    ## Now deal with last character.
    ## Without this your string would miss it.
    if count > 2 or prev_char == '/':
        res.append(
            f"/{count}{prev_char}"
        )
    else:
        res.append(prev_char*count)

    ## And build our string
    return ''.join(res)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM