简体   繁体   English

python问题中的字符串切片

[英]String slicing in python issue

I'm writing a simple program to decode a binary string given by: 我正在编写一个简单的程序来解码以下给出的二进制字符串:

bin_str = "101100001101100001"

At the start, if the first character is represented by "1", then the next eight characters are decoded which would be "01100001" where i pass "01100001" into the function to obtain it's ascii representation. 首先,如果第一个字符由“ 1”表示,则接下来的八个字符将被解码为“ 01100001”,其中我将“ 01100001”传递给函数以获取其ascii表示形式。

def convert_ascii(binary):
    c = chr(int(binary, 2))
    return c

Passing in "01100001" into the above function would yield "a" which is the first character decoded. 将“ 01100001”传递给上述函数将产生“ a”,这是解码的第一个字符。 Moving on, the next character at index 9 is also represented by "1" hence the next eight characters will also be decoded which is "01100001". 继续,索引9处的下一个字符也由“ 1”表示,因此接下来的八个字符也将被解码为“ 01100001”。 Passing in to the above would also yield "a". 传递到上面也会产生“ a”。

lst = []
fixed_length = 8
i = 0
while i < len(bin_str):
    if binary[i] == "1":
        fl_bin = binary[i+1:fixed_length+1] #issue here
        ascii_rep = convert_ascii(fl_bin)
        lst.append(ascii_rep)
        i+=fixed_length+1

The problem I'm facing is slicing up the particular string of length 8 which is "01100001" from the original bin_str where i tried slicing by [i+1:fixed_length+1] but on the second phase, the fl_bin became "" instead of the next "01100001". 我面临的问题是从原始bin_str中切出长度为8的特定字符串“ 01100001”,在此我尝试通过[i + 1:fixed_length + 1]进行切片,但是在第二阶段,fl_bin变成了“”下一个“ 01100001”。

Would appreciate some help on this. 希望对此有所帮助。

A nice way to do this is to create a regular expression that matches 1 followed by exactly eight 1 or 0 characters, and then use re.findall() to find all non-overlapping occurrences of this pattern in the string. 执行此操作的一种好方法是创建一个正则表达式,该正则表达式匹配1后跟正好是8个10字符,然后使用re.findall()查找字符串中此模式的所有不重叠的出现。 By using a non-capturing group, you can even keep the initial 1 digit from being included in the results (although if you didn't do this, it's trivial to slice off that digit). 通过使用非捕获组,您甚至可以使结果中不包含开头的1位数字(尽管如果您不这样做,则将其切掉很简单)。

import re
reg_ex = "(?:1)([01]{8})"

bin_str = "101100001101100001"
ascii_rep = "".join(chr(int(byte, 2)) for byte in re.findall(reg_ex, bin_str))

As a bonus, this allows the groups in teh source string to be separated (by spaces, or words, or anything that's not a 1 followed by 8 0 s or 1 s) for easier reading. 另外,这可以使源字符串中的组分开(用空格或单词或任何不是1后跟8 0 s或1 s的字符),以便于阅读。

Using iter and next to cycle through if next produces a 1 then create a sublist of the next 8 items, append that to the main list and repeat until the generator is exhausted. 使用iternext通过仿佛周期next产生1然后创建下一个8个项目的子表,它添加到主列表和重复,直到发电机被耗尽。

bin_str = "101100001101100001" 
a = iter(bin_str) 
lst = []

while True:
    try:
        b = next(a)
        z = []
        if b == '1':
            for i in range(8):
                z.append(next(a))
            lst.append(''.join(z))
    except StopIteration:
        break

print(lst)
# ['01100001', '01100001']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM