简体   繁体   English

有没有一种快速的方法可以在Python中返回没有转义序列的字符串?

[英]Is there a quick way to return a string without its escape sequences in Python?

I want to be able to tell if a string foo == 'some string'. 我想知道一个字符串foo =='some string'。 This works most of the time. 这在大多数情况下都有效。 I realize, however, that there may be times when foo contains escape sequences such as '\\n' or '\\t', and I want to account for this. 但是我意识到,有时foo包含转义序列(例如'\\ n'或'\\ t'),我想对此加以说明。 Is there anything quick, or built-in to Python 2.7 that will help me with this? 有什么快速的或Python 2.7内置的功能可以帮助我吗? Or will I have to essentially go through all escape sequences and make sure none of them are infesting my string foo? 还是我必须本质上遍历所有转义序列,并确保它们中的任何一个都不会侵扰我的字符串foo?

Here's an example if you're still unsure: 如果您仍然不确定,请参考以下示例:

foo = '\tZebra'

So when I print foo it appears as 因此,当我打印foo时,它显示为

    Zebra

and I can't easily be sure that there are no escape sequences such as '\\t' when testing foo against a string literal: 而且我无法轻易确定在针对字符串文字测试foo时是否没有转义序列,例如'\\ t':

foo == 'Zebra'

returns False. 返回False。

I thought of is using these lines: 我想到的是使用这些行:

if 'Zebra' in foo:
    bar()

but this accounts for MORE than just escape sequences of Python. 但这不只是Python转义序列的原因。 For example: 例如:

foo = 'ttZebra'
if 'Zebra' in foo:
    print 'bar'

this will indeed print 'bar'. 这确实会打印“条”。

So, how can I quickly remove all escape sequences from a string before I use it? 因此,如何在使用字符串之前快速删除字符串中的所有转义序列? Also, if this helps, I know that none of my strings will contain spaces because they all come from a .split() list. 另外,如果这有帮助,我知道我的所有字符串都不包含空格,因为它们都来自.split()列表。


Answer: 回答:

I tried using .strip(), and that helped, but my program still wasn't working. 我尝试使用.strip(),这很有帮助,但是我的程序仍然无法正常工作。 It turns out all of my files have UTF-8 BOM s. 原来我所有的文件都具有UTF-8 BOM However, the BOM sequence is always the same so it is very easy to deal with. 但是,BOM序列始终相同,因此非常易于处理。 I still use .strip() to account for all escape sequences. 我仍然使用.strip()解决所有转义序列。

If you're asking about stripping any sequences from the beginning and end of the string, then use strip() 如果您要从字符串的开头和结尾剥离任何序列,请使用strip()

>>> foo = '\tZebra'
>>> foo.strip()
>>> 'Zebra'

If you want it to strip in the middle of the string as well, you can do the following 如果您还希望它在字符串的中间剥离,则可以执行以下操作

>>> import re
>>> re.sub('[\x00-\x1F\x7F]', '', '\tZebra\tZebra')
'ZebraZebra'

The above regular expression strips out all control characters . 上面的正则表达式去除了所有控制字符

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM