简体   繁体   English

如何去除 Python 三引号多行字符串的额外缩进?

[英]How to remove extra indentation of Python triple quoted multi-line strings?

I have a python editor where the user is entering a script or code, which is then put into a main method behind the scenes, while also having every line indented.我有一个 python 编辑器,用户在其中输入脚本或代码,然后将其放入幕后的主要方法中,同时每行都缩进。 The problem is that if a user has a multi line string, the indentation made to the whole script affects the string, by inserting a tab in every space.问题是,如果用户有一个多行字符串,则对整个脚本所做的缩进会通过在每个空格中插入一个制表符来影响字符串。 A problem script would be something so simple as:一个问题脚本会很简单:

"""foo
bar
foo2"""

So when in the main method it would look like:因此,当在 main 方法中时,它看起来像:

def main():
    """foo
    bar
    foo2"""

and the string would now have an extra tab at the beginning of every line.并且该字符串现在将在每行的开头有一个额外的制表符。

标准库中的textwrap.dedent用于自动撤消古怪的缩进。

From what I see, a better answer here might be inspect.cleandoc , which does much of what textwrap.dedent does but also fixes the problems that textwrap.dedent has with the leading line.从我所见,这里更好的答案可能是inspect.cleandoc ,它完成了textwrap.dedent所做的大部分工作,但也修复了textwrap.dedent与引导线的问题。

The below example shows the differences:以下示例显示了差异:

>>> import textwrap
>>> import inspect
>>> x = """foo bar
    baz
    foobar
    foobaz
    """
>>> inspect.cleandoc(x)
'foo bar\nbaz\nfoobar\nfoobaz'
>>> textwrap.dedent(x)
'foo bar\n    baz\n    foobar\n    foobaz\n'
>>> y = """
...     foo
...     bar
... """
>>> inspect.cleandoc(y)
'foo\nbar'
>>> textwrap.dedent(y)
'\nfoo\nbar\n'
>>> z = """\tfoo
bar\tbaz
"""
>>> inspect.cleandoc(z)
'foo\nbar     baz'
>>> textwrap.dedent(z)
'\tfoo\nbar\tbaz\n'

Note that inspect.cleandoc also expands internal tabs to spaces.请注意, inspect.cleandoc还将内部制表符扩展为空格。 This may be inappropriate for one's use case, but works fine for me.这可能不适合一个人的用例,但对我来说很好。

What follows the first line of a multiline string is part of the string, and not treated as indentation by the parser.多行字符串第一行后面的内容是字符串的一部分,不被解析器视为缩进。 You may freely write:你可以自由地写:

def main():
    """foo
bar
foo2"""
    pass

and it will do the right thing.它会做正确的事情。

On the other hand, that's not readable, and Python knows it.另一方面,这是不可读的,Python 知道这一点。 So if a docstring contains whitespace in it's second line, that amount of whitespace is stripped off when you use help() to view the docstring.因此,如果文档字符串在其第二行中包含空格,那么当您使用help()查看文档字符串时,会删除该数量的空格。 Thus, help(main) and the below help(main2) produce the same help info.因此, help(main)和下面的help(main2)产生相同的帮助信息。

def main2():
    """foo
    bar
    foo2"""
    pass

The only way i see - is to strip first n tabs for each line starting with second, where n is known identation of main method.我看到的唯一方法 - 是从第二行开始去除每行的前 n 个选项卡,其中 n 是主要方法的已知标识。

If that identation is not known beforehand - you can add trailing newline before inserting it and strip number of tabs from the last line...如果事先不知道该标识 - 您可以在插入之前添加尾随换行符并从最后一行删除制表符数...

The third solution is to parse data and find beginning of multiline quote and do not add your identation to every line after until it will be closed.第三种解决方案是解析数据并找到多行引号的开头,并且在关闭之前不要将标识添加到每一行。

Think there is a better solution..认为有更好的解决方案..

Showing the difference between textwrap.dedent and inspect.cleandoc with a little more clarity:更清晰地显示textwrap.dedentinspect.cleandoc之间的区别:

Behavior with the leading part not indented前导部分未缩进的行为

import textwrap
import inspect

string1="""String
with
no indentation
       """
string2="""String
        with
        indentation
       """
print('string1 plain=' + repr(string1))
print('string1 inspect.cleandoc=' + repr(inspect.cleandoc(string1)))
print('string1 texwrap.dedent=' + repr(textwrap.dedent(string1)))
print('string2 plain=' + repr(string2))
print('string2 inspect.cleandoc=' + repr(inspect.cleandoc(string2)))
print('string2 texwrap.dedent=' + repr(textwrap.dedent(string2)))

Output输出

string1 plain='String\nwith\nno indentation\n       '
string1 inspect.cleandoc='String\nwith\nno indentation\n       '
string1 texwrap.dedent='String\nwith\nno indentation\n'
string2 plain='String\n        with\n        indentation\n       '
string2 inspect.cleandoc='String\nwith\nindentation'
string2 texwrap.dedent='String\n        with\n        indentation\n'

Behavior with the leading part indented缩进前导部分的行为

string1="""
String
with
no indentation
       """
string2="""
        String
        with
        indentation
       """

print('string1 plain=' + repr(string1))
print('string1 inspect.cleandoc=' + repr(inspect.cleandoc(string1)))
print('string1 texwrap.dedent=' + repr(textwrap.dedent(string1)))
print('string2 plain=' + repr(string2))
print('string2 inspect.cleandoc=' + repr(inspect.cleandoc(string2)))
print('string2 texwrap.dedent=' + repr(textwrap.dedent(string2)))

Output输出

string1 plain='\nString\nwith\nno indentation\n       '
string1 inspect.cleandoc='String\nwith\nno indentation\n       '
string1 texwrap.dedent='\nString\nwith\nno indentation\n'
string2 plain='\n        String\n        with\n        indentation\n       '
string2 inspect.cleandoc='String\nwith\nindentation'
string2 texwrap.dedent='\nString\nwith\nindentation\n'

I wanted to preserve exactly what is between the triple-quote lines, removing common leading indent only.我想保留三引号行之间的内容,仅删除常见的前导缩进。 I found that texwrap.dedent and inspect.cleandoc didn't do it quite right, so I wrote this one.我发现texwrap.dedentinspect.cleandoc做的不太对,所以写了这个。 It uses os.path.commonprefix .它使用os.path.commonprefix

import re
from os.path import commonprefix

def ql(s, eol=True):
    lines = s.splitlines()
    l0 = None
    if lines:
        l0 = lines.pop(0) or None
    common = commonprefix(lines)
    indent = re.match(r'\s*', common)[0]
    n = len(indent)
    lines2 = [l[n:] for l in lines]
    if not eol and lines2 and not lines2[-1]:
        lines2.pop()
    if l0 is not None:
        lines2.insert(0, l0)
    s2 = "\n".join(lines2)
    return s2

This can quote any string with any indent.这可以引用任何缩进的任何字符串。 I wanted it to include the trailing newline by default, but with an option to remove it so that it can quote any string neatly.我希望它默认包含尾随换行符,但可以选择删除它,以便它可以整齐地引用任何字符串。

Example:例子:

print(ql("""
     Hello
    |\---/|
    | o_o |
     \_^_/
    """))

print(ql("""
         World
        |\---/|
        | o_o |
         \_^_/
    """))

The second string has 4 spaces of common indentation because the final """ is indented less than the quoted text:第二个字符串有 4 个常用缩进空格,因为最后一个"""的缩进小于引用的文本:

 Hello
|\---/|
| o_o |
 \_^_/

     World
    |\---/|
    | o_o |
     \_^_/

I thought this was going to be simpler, otherwise I wouldn't have bothered with it!我以为这会更简单,否则我就不会费心了!

I had a similar issue: I wanted my triple quoted string to be indented, but I didn't want the string to have all those spaces at the beginning of each line.我有一个类似的问题:我希望我的三引号字符串缩进,但我不希望字符串在每行的开头都有所有这些空格。 I used re to correct my issue:我用re来纠正我的问题:

        print(re.sub('\n *','\n', f"""Content-Type: multipart/mixed; boundary="===============9004758485092194316=="
`           MIME-Version: 1.0
            Subject: Get the reader's attention here!
            To: recipient@email.com

            --===============9004758485092194316==
            Content-Type: text/html; charset="us-ascii"
            MIME-Version: 1.0
            Content-Transfer-Encoding: 7bit

            Very important message goes here - you can even use <b>HTML</b>.
            --===============9004758485092194316==--
        """))

Above, I was able to keep my code indented, but the string was left trimmed essentially.上面,我能够保持我的代码缩进,但字符串基本上被修剪了。 All spaces at the beginning of each line were deleted.每行开头的所有空格都被删除。 This was important since any spaces in front of the SMTP or MIME specific lines would break the email message.这很重要,因为 SMTP 或 MIME 特定行前面的任何空格都会破坏 email 消息。

The tradeoff I made was that I left the Content-Type on the first line because the regex I was using didn't remove the initial \n (which broke email).我做出的权衡是我将Content-Type留在了第一行,因为我使用的regex没有删除初始的\n (这破坏了电子邮件)。 If it bothered me enough, I guess I could have added an lstrip like this:如果它足够困扰我,我想我可以添加一个这样的 lstrip:

print(re.sub('\n *','\n', f"""
    Content-Type: ...
""").lstrip()

After reading this 10 year old page, I decided to stick with re.sub since I didn't truly understand all the nuances of textwrap and inspect .在阅读了这个 10 年前的页面后,我决定坚持使用re.sub ,因为我并没有真正理解textwrapinspect的所有细微差别。

There is a much simpler way:有一个更简单的方法:

    foo = """first line\
             \nsecond line"""

This does the trick, if I understand the question correctly.如果我正确理解了这个问题,这就是诀窍。 Note that lstrip() removes leading whitespace, so it will remove tabs as well as spaces.请注意, lstrip()会删除前导空格,因此它将删除制表符和空格。

from os import linesep

def dedent(message):
    return linesep.join(line.lstrip() for line in message.splitlines())

Example:例子:

name='host'
config_file='/Users/nmellor/code/cold_fusion/end-to-end/config/stage.toml'
message = f"""Missing env var or configuration entry for 'host'. 
              Please add '{name}' entry to file
              {config_file}
              or export environment variable 'mqtt_{name}' before
              running the program.
           """

>>> print(message)
Missing env var or configuration entry for 'host'. 
              Please add 'host' entry to
              '/Users/nmellor/code/cold_fusion/end-to-end/config/stage.toml'
              or export environment variable 'mqtt_host' before
              running the program.

>>> print(dedent(message))
Missing env var or configuration entry for 'host'. 
Please add 'host' entry to file
'/Users/nmellor/code/cold_fusion/end-to-end/config/stage.toml'
or export environment variable 'mqtt_host' before
running the program.

So if I get it correctly, you take whatever the user inputs, indent it properly and add it to the rest of your program (and then run that whole program).因此,如果我正确理解,您可以接受用户输入的任何内容,正确缩进并将其添加到程序的其余部分(然后运行整个程序)。

So after you put the user input into your program, you could run a regex, that basically takes that forced indentation back.因此,在您将用户输入放入您的程序后,您可以运行一个正则表达式,这基本上可以收回强制缩进。 Something like: Within three quotes, replace all "new line markers" followed by four spaces (or a tab) with only a "new line marker".类似于:在三个引号内,将所有“换行标记”后跟四个空格(或一个制表符)替换为“换行标记”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM