简体   繁体   English

逃脱Windows的路径分隔符

[英]Escape Windows's Path Delimiter

I need to change this string by escaping the windows path delimiters. 我需要通过转义Windows路径分隔符来更改此字符串。 I don't define the original string myself, so I can't pre-pend the raw string 'r'. 我自己没有定义原始字符串,因此无法在原始字符串'r'之前加上前缀。

I need this: 我需要这个:

s = 'C:\foo\bar'

to be this: 是这样的:

s = 'C:\\foo\\bar'

Everything I can find here and elsewhere says to do this: 我在这里和其他地方都能找到的一切都说明了这一点:

s.replace( r'\\', r'\\\\' )

(Why I should have to escape the character inside a raw string I can't imagine) (为什么我必须对我无法想象的原始字符串中的字符进行转义)

But printing the string results in this. 但是打印字符串会导致这种情况。 Obviously something has decided to re-interpret the escapes in the modified string: 显然,已决定重新解释修改后的字符串中的转义符:

C:♀oar

This would be so simple in Perl. 在Perl中,这将是如此简单。 How do I solve this in Python? 如何在Python中解决此问题?

After a bunch of questions back and forth, the actual problem is this: 经过一堆来回的提问,实际的问题是这样的:

You have a file with contents like this: 您有一个文件,其内容如下:

C:\foo\bar
C:\spam\eggs

You want to read the contents of that file, and use it as pathnames, and you want to know how to escape things. 您想读取该文件的内容,并将其用作路径名,并且想知道如何转义。

The answer is that you don't have to do anything at all. 答案是您根本不需要做任何事情。

Backslash sequences are processed in string literals , not in string objects that you read from a file, or from input (in 3.x; in 2.x that's raw_input ), etc. So, you don't need to escape those backslash sequences. 反斜杠序列以字符串文字形式处理,而不是从文件或input (在3.x;在2.x中为raw_input )中读取的字符串对象中进行处理,因此,您无需转义那些反斜杠序列。

If you think about it, you don't need to add quotes around a string to turn it into a string. 如果您考虑一下,则无需在字符串周围添加引号即可将其转换为字符串。 And this is exactly the same case. 这是完全一样的情况。 The quotes and the escaping backslashes are both part of the string's representation , not the string itself. 引号和转义的反斜杠都是字符串表示形式的一部分 ,而不是字符串本身。


In other words, if you save that example file as paths.txt , and you run the following code: 换句话说,如果将示例文件另存为paths.txt ,然后运行以下代码:

with open('paths.txt') as f:
    file_paths = [line.strip() for line in f]
literal_paths = [r'C:\foo\bar', r'C:\spam\eggs']
print(file_paths == literal_paths)

… it will print out True . …它将打印出True


Of course if your file was generated incorrectly and is full of garbage like this: 当然,如果您的文件生成不正确并且充满了这样的垃圾:

C:♀oar

Then there is no way to "escape the backslashes", because they're not there to escape. 然后就没有办法“转义反斜杠”,因为它们无法逃脱。 You can try to write heuristic code to reconstruct the original data that was supposed to be there, but that's the best you can do. 您可以尝试编写启发式代码来重建 应该存在的原始数据,但这是您能做到的最好的。

For example, you could do something like this: 例如,您可以执行以下操作:

backslash_map = { '\a': r'\a', '\b': r'\b', '\f': r'\f', 
                  '\n': r'\n', '\r': r'\r', '\t': r'\t', '\v': r'\v' }
def reconstruct_broken_string(s):
    for key, value in backslash_map.items():
        s = s.replace(key, value)
    return s

But this won't help if there were any hex, octal, or Unicode escape sequences to undo. 但是,如果要撤消十六进制,八进制或Unicode转义序列,这将无济于事。 For example, 'C:\\foo\\x02' and 'C:\\foo\\b' both represent the exact same string, so if you get that string, there's no way to know which one you're supposed to convert to. 例如, 'C:\\foo\\x02''C:\\foo\\b'都表示完全相同的字符串,因此,如果您获得该字符串,则无法知道应该转换为哪个字符串。 That's why the best you can do is a heuristic. 这就是为什么您能做的最好的就是启发式。

Don't do s.replace(anything) . 不要做s.replace(anything) Just stick an r in front of the string literal, before the opening quote, so you have a raw string. 只需在字符串文字前的右引号前加上一个r ,就可以得到一个原始字符串。 Anything based on string replacement would be a horrible kludge, since s doesn't actually have backslashes in it; 任何基于字符串替换的东西都将是一个可怕的麻烦,因为s中实际上没有反斜杠。 your code has backslashes in it, but those don't become backslashes in the actual string. 您的代码中包含反斜杠,但在实际的字符串中不会变成反斜杠。

If the string actually has backslashes in it, and you want the string to have two backslashes wherever there once was one, you want this: 如果字符串中确实包含反斜杠,并且您希望该字符串在曾经有一个的地方都带有两个反斜杠,那么您需要这样做:

s = s.replace('\\', r'\\')

That'll replace any single backslash with two backslashes. 那将用两个反斜杠替换任何单个反斜杠。 If the string literally appears in the source code as s = 'C:\\foo\\bar' , though, the only reasonable solution is to change that line. 但是,如果字符串在字面上以s = 'C:\\foo\\bar'出现在源代码中,则唯一合理的解决方案是更改该行。 It's broken, and anything you do to the rest of the code won't make it not broken. 它已损坏,您对其余代码所做的任何操作都不会使其损坏。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM