從二進制解碼為字符串到集合中將字符串拆分為字符 Python

Question

我有一個問題。 我編寫了以二進制模式（為了速度）讀取文件的代碼，然后將我從正則表達式匹配中讀取的內容解碼為一組。 問題是該集合采用這些解碼的字符串並將它們轉換為字符。 如果您有“14(xx) 23(WP)”，則正則表達式將獲得 WP 和 xx。 應該發生的是，然后它將獲取 WP 和 xx 並將其作為一個元素放入 logbinset。 但是，當它發生時，它會變成 {'W', 'P', 'x', 'x'} 而不是 {"WP", "xx"} 當我使用列表時我沒有這個問題。

但是我想避免使用列表，因為它包含重復項並且我不需要重復值。 此外，集合的閱讀和迭代速度更快，我不需要額外的代碼行來確保我的列表不會重復。

為什么我的琴弦會發生這種分裂？ 我也嘗試在不解碼的情況下接收二進制文件，但 Set 出於任何原因將其轉換為 Int 。 我的程序和 Python 的集合結構是怎么回事？ ：

def odfs_bin_conversion_table_check(bintablecsv, filename):
bincsv_df = pd.read_csv(bintablecsv)
setbincsv_df = set(bincsv_df['MicronBin'])
with open(filename, "rb", buffering=102400) as lines:
    regex = re.compile(rb"\d+\((.+)\)\s+\d+\((.+)\)")
    logbinset = set()
    logbinlist = []
    missingbins = ""
    for match in filter(bool, map(regex.search, lines)):  # if search in lines, put it in match
        #logbinset.update(match.group(1))  # put matches inside logbinset
        logbinset.update((match.group(1)).decode('UTF-8','strict'))
        logbinlist.append((match.group(1)).decode())
        print(match.group(1))
        #print((match.group(1)).decode() + " " + (match.group(1)).decode()) #visual check. Can be commented out
    for x in logbinset:
        print(x)
        if x not in setbincsv_df:
            print(type(x))
            #missingbins += x.decode() + ","
    if len(missingbins) > 0:
        return missingbins[:-1] + " are not in conversion table"

Answer 1

這與正則表達式或以二進制模式讀取文件完全無關。

set.update將其參數視為可迭代對象，並將可迭代對象的每個元素添加到集合中。 字符串是可迭代的，其中迭代產生單個字符：

>>> for x in 'WP':
...     print(x)
W
P

所以使用set.update ，這會產生一組字符：

>>> s = set()
>>> s.update('WP')
>>> s
{'W', 'P'}

要將字符串"WP"作為一項添加到集合中，請使用add方法：

>>> s = set()
>>> s.add('WP')
>>> s
{'WP'}

從二進制解碼為字符串到集合中將字符串拆分為字符 Python

問題描述

1 個解決方案

解決方案1
1 已采納 2020-06-09 21:58:10

從二進制解碼為字符串到集合中將字符串拆分為字符 Python

問題描述

1 個解決方案

解決方案1 1 已采納 2020-06-09 21:58:10

解決方案1
1 已采納 2020-06-09 21:58:10