I have the following Xterm's output:
text = '\x1b[0m\x1b[01;32mattr\x1b[0m\n\x1b[01;36mawk\x1b[0m\n\x1b[01;32mbasename\x1b[0m\n\x1b[01;32mbash\n\x1b[0many text'
I known that \\x1b[0m
is to remove all text attributes and \\x1b[01
if for bold text, \\x1b[32m
is green text and \\x1b[01;32m
is a bold green text. So how can I pass those escape characters to my own tags? Like this:
\x1b[0m\x1b[01;32mattr --> <bold><green>attr</bold></green>
I want that my text
variable become this:
text = '<bold><green>attr</bold></green>\n<bold><cyan>awk</bold></cyan>\n<bold><green>basename</bold></green>\n<bold><green>bash</bold></green>\nanytext'
import re
text = '\x1b[0m\x1b[01;32mattr\x1b[0m\n\x1b[01;36mawk\x1b[0m\n\x1b[01;32mbasename\x1b[0m\n\x1b[01;32mbash\n\x1b[0many text'
# dictionary mapping text attributes to tag names
fmt = {'01':'bold', '32m':'green', '36m': 'cyan'}
# regex that gets all text attributes, the text and any potential newline
groups = re.findall('(\n?)\\x1b\[((?:(?:0m|32m|01|36m);?)+)([a-zA-Z ]+)', text)
# iterate through the groups and build your new string
xml = []
for group in groups:
g_text = group[2] # the text itself
for tag in group[1].split(';'): # the text attributes
if tag in fmt:
tag = fmt[tag]
else:
continue
g_text = '<%s>%s</%s>' %(tag,g_text,tag)
g_text = group[0] + g_text # add a newline if necessary
xml.append(g_text)
xml_text = ''.join(xml)
print(xml_text)
<green><bold>attr</bold></green>
<cyan><bold>awk</bold></cyan>
<green><bold>basename</bold></green>
<green><bold>bash</bold></green>
any text
For a demo on the regex see this link: Debuggex Demo
Currently the regex assumes that you only have alpha characters or spaces in the actual text but feel free to change this group ([a-zA-Z ]+)
at the end of the regex to include other characters that you may have in your text.
Also, I'm assuming you have more text attributes than bold, green, and cyan. You will need to update the fmt
dictionary with your other attributes and their mappings.
EDIT
@Caaarlos' has requested in the comments (below) to keep the ansi code as is in the output if it doesn't appear in the fmt
dictionary:
import re
text = '\x1b[0m\x1b[01;32;35mattr\x1b[0;7m\n\x1b[01;36mawk\x1b[0m\n\x1b[01;32;47mbasename\x1b[0m\n\x1b[01;32mbash\n\x1b[0many text'
fmt = {'01':'bold', '32':'green', '36': 'cyan'}
xml = []
active_tags = []
for group in re.split('\x1b\[', text):
if group.strip():
codes, text = re.split('((?:\d+;?)+)m', group)[1:]
not_found = []
for tag in codes.split(';'):
if tag in fmt:
tag = fmt[tag]
text = '<%s>%s' %(tag,text)
active_tags.append(tag)
elif tag == '0':
for a_tag in active_tags[::-1]:
text = '</%s>%s' %(a_tag,text)
active_tags = []
else:
not_found.append(tag)
if not_found:
text = '\x1b[%sm%s' %(';'.join(not_found), text)
xml.append(text)
xml_text = ''.join(xml)
print(repr(xml_text))
'\x1b[35m<green><bold>attr\x1b[7m</bold></green>\n<cyan><bold>awk</bold></cyan>\n\x1b[47m<green><bold>basename</bold></green>\n<green><bold>bash\n</bold></green>any text'
Note that the edited code above also handles cases where the tag isn't closed directly after the text.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.