在 Python 中為帶有 ANSI 顏色代碼的字符串獲取正確的字符串長度

Question

我有一些 Python 代碼，它們會自動以漂亮的列格式打印一組數據，包括放入適當的 ASCII 轉義序列來為不同的數據片段着色以提高可讀性。

我最終將每一行表示為一個列表，每個項目都是一個用空格填充的列，以便每行上的相同列始終具有相同的長度。 不幸的是，當我真正去打印這個時，並不是所有的列都排成一行。 我懷疑這與 ASCII 轉義序列有關 - 因為len函數似乎無法識別這些：

>>> a = '\x1b[1m0.0\x1b[0m'
>>> len(a)
11
>>> print a
0.0

因此，雖然根據len每一列的長度相同，但在屏幕上打印時它們的長度實際上並不相同。

有什么方法（除了用我不想做的正則表達式做一些hackery）來獲取轉義的字符串並找出打印的長度是多少，以便我可以適當地填充空格？ 也許某種方法可以將它“打印”回字符串並檢查它的長度？

Answer 1

pyparsing wiki 包含這個有用的表達式，用於匹配 ANSI 轉義序列：

ESC = Literal('\x1b')
integer = Word(nums)
escapeSeq = Combine(ESC + '[' + Optional(delimitedList(integer,';')) + 
                oneOf(list(alphas)))

以下是將其變成轉義序列剝離器的方法：

from pyparsing import *

ESC = Literal('\x1b')
integer = Word(nums)
escapeSeq = Combine(ESC + '[' + Optional(delimitedList(integer,';')) + 
                oneOf(list(alphas)))

nonAnsiString = lambda s : Suppress(escapeSeq).transformString(s)

unColorString = nonAnsiString('\x1b[1m0.0\x1b[0m')
print unColorString, len(unColorString)

印刷：

0.0 3

Answer 2

我不明白兩件事。

(1) 這是你的代碼，在你的控制之下。 您想將轉義序列添加到您的數據中，然后再次刪除它們，以便您可以計算數據的長度？？ 在添加轉義序列之前計算填充似乎要簡單得多。 我錯過了什么？

讓我們假設沒有任何轉義序列改變光標位置。 如果他們這樣做，則當前接受的答案無論如何都不起作用。

假設您在名為string_data的列表中擁有每列的字符串數據（在添加轉義序列之前），並且預先確定的列寬在名為width的列表中。 嘗試這樣的事情：

temp = []
for colx, text in enumerate(string_data):
    npad = width[colx] - len(text) # calculate padding size
    assert npad >= 0
    enhanced = fancy_text(text, colx, etc, whatever) # add escape sequences
    temp.append(enhanced + " " * npad)
sys.stdout.write("".join(temp))

更新 1

在 OP 發表評論后：

我想去掉它們並在字符串包含顏色代碼后計算長度的原因是因為所有數據都是以編程方式構建的。 我有一堆着色方法，我正在構建這樣的數據： str = "%s/%s/%s" % (GREEN(data1), BLUE(data2), RED(data3))它會事后為文本着色非常困難。

如果數據由各部分組成，每個部分都有自己的格式，您仍然可以根據需要計算顯示的長度和填充。 這是一個為一個單元格的內容執行此操作的函數：

BLACK, RED, GREEN, YELLOW, BLUE, MAGENTA, CYAN, WHITE = range(40, 48)
BOLD = 1

def render_and_pad(reqd_width, components, sep="/"):
    temp = []
    actual_width = 0
    for fmt_code, text in components:
        actual_width += len(text)
        strg = "\x1b[%dm%s\x1b[m" % (fmt_code, text)
        temp.append(strg)
    if temp:
        actual_width += len(temp) - 1
    npad = reqd_width - actual_width
    assert npad >= 0
    return sep.join(temp) + " " * npad

print repr(
    render_and_pad(20, zip([BOLD, GREEN, YELLOW], ["foo", "bar", "zot"]))
    )

如果您認為標點符號負擔過重，您可以執行以下操作：

BOLD = lambda s: (1, s)
BLACK = lambda s: (40, s)
# etc
def render_and_pad(reqd_width, sep, *components):
    # etc

x = render_and_pad(20, '/', BOLD(data1), GREEN(data2), YELLOW(data3))

(2)我不明白你為什么不想使用隨附的 Python 正則表達式工具包？ 不涉及“黑客”（對於我所知道的“黑客”的任何可能含義）：

>>> import re
>>> test = "1\x1b[a2\x1b[42b3\x1b[98;99c4\x1b[77;66;55d5"
>>> expected = "12345"
>>> # regex = re.compile(r"\x1b\[[;\d]*[A-Za-z]")
... regex = re.compile(r"""
...     \x1b     # literal ESC
...     \[       # literal [
...     [;\d]*   # zero or more digits or semicolons
...     [A-Za-z] # a letter
...     """, re.VERBOSE)
>>> print regex.findall(test)
['\x1b[a', '\x1b[42b', '\x1b[98;99c', '\x1b[77;66;55d']
>>> actual = regex.sub("", test)
>>> print repr(actual)
'12345'
>>> assert actual == expected
>>>

更新 2

在 OP 發表評論后：

我仍然更喜歡保羅的回答，因為它更簡潔

比什么更簡潔？ 以下正則表達式解決方案對您來說還不夠簡潔嗎？

# === setup ===
import re
strip_ANSI_escape_sequences_sub = re.compile(r"""
    \x1b     # literal ESC
    \[       # literal [
    [;\d]*   # zero or more digits or semicolons
    [A-Za-z] # a letter
    """, re.VERBOSE).sub
def strip_ANSI_escape_sequences(s):
    return strip_ANSI_escape_sequences_sub("", s)

# === usage ===
raw_data = strip_ANSI_escape_sequences(formatted_data)

[以上代碼在@Nick Perkins 指出它不起作用后更正了]

Answer 3

查看ANSI_escape_code ，您示例中的序列是Select Graphic Rendition （可能是bold ）。

嘗試使用CUrsor Position ( CSI n ; m H ) 序列控制列定位。 這樣，前面文本的寬度不會影響當前列的位置，也無需擔心字符串寬度。

如果您面向 Unix，更好的選擇是使用Curses 模塊 window-objects 。 例如，可以通過以下方式將字符串定位在屏幕上：

window.addnstr([y, x], str, n[, attr])

使用屬性 attr 在 (y, x) 處繪制字符串 str 的最多 n 個字符，覆蓋以前顯示的任何內容。

Answer 4

如果您只是為某些單元格添加顏色，則可以將 9 添加到預期的單元格寬度（5 個隱藏字符打開顏色，4 個隱藏字符關閉顏色），例如

import colorama # handle ANSI codes on Windows
colorama.init()

RED   = '\033[91m' # 5 chars
GREEN = '\033[92m' # 5 chars
RESET = '\033[0m'  # 4 chars

def red(s):
    "color a string red"
    return RED + s + RESET
def green(s):
    "color a string green"
    return GREEN + s + RESET
def redgreen(v, fmt, sign=1):
    "color a value v red or green, depending on sign of value"
    s = fmt.format(v)
    return red(s) if (v*sign)<0 else green(s)

header_format = "{:9} {:5}  {:>8}  {:10}  {:10}  {:9}  {:>8}"
row_format =    "{:9} {:5}  {:8.2f}  {:>19}  {:>19}  {:>18}  {:>17}"
print(header_format.format("Type","Trial","Epsilon","Avg Reward","Violations", "Accidents","Status"))

# some dummy data
testing = True
ntrials = 3
nsteps = 1
reward = 0.95
actions = [0,1,0,0,1]
d = {'success': True}
epsilon = 0.1

for trial in range(ntrials):
    trial_type = "Testing " if testing else "Training"
    avg_reward = redgreen(float(reward)/nsteps, "{:.2f}")
    violations = redgreen(actions[1] + actions[2], "{:d}", -1)
    accidents = redgreen(actions[3] + actions[4], "{:d}", -1)
    status = green("On time") if d['success'] else red("Late")
    print(row_format.format(trial_type, trial, epsilon, avg_reward, violations, accidents, status))

給予

在 Python 中為帶有 ANSI 顏色代碼的字符串獲取正確的字符串長度

問題描述

4 個解決方案

解決方案1
11 已采納 2010-02-02 19:33:30

解決方案2
4 2010-02-02 22:55:38

解決方案3
1 2010-02-02 19:45:53

解決方案4
1 2016-12-02 12:51:10

在 Python 中為帶有 ANSI 顏色代碼的字符串獲取正確的字符串長度

問題描述

4 個解決方案

解決方案1 11 已采納 2010-02-02 19:33:30

解決方案2 4 2010-02-02 22:55:38

解決方案3 1 2010-02-02 19:45:53

解決方案4 1 2016-12-02 12:51:10

解決方案1
11 已采納 2010-02-02 19:33:30

解決方案2
4 2010-02-02 22:55:38

解決方案3
1 2010-02-02 19:45:53

解決方案4
1 2016-12-02 12:51:10