简体   繁体   English

逃脱无法逃脱的角色时该怎么办?

[英]What to do when unescapable character(s) are escaped?

In designing of a (mini)language: When there are certain characters that should be escaped to lose special meanings (like quotes in some programming languages), what should be done, especially from a security perspective, when characters that are not escapable (eg normal characters which never have special meaning) are escaped? 在设计(迷你)语言时:当某些字符应转义以失去特殊含义(例如某些编程语言中的引号)时,当无法逃脱的字符时(例如,从安全角度出发),应该做些什么那些没有特殊意义的普通字符)被转义了吗? Should an error be "error"ed, or should the character be discarded, or should it be in the output the same as if it was not escaped? 错误应该被“错误化”,还是应该丢弃字符,或者输出中的字符是否与未转义字符相同?

Example: In a simple language where strings are delimited by double-quotes( " ), and any quotes in a given string are escaped with a back-slash( \\ ): for input "We \\said, \\"We want Moshiach Now\\"" -- what would should be done with the letter s in said which is escaped? 示例:在一种简单的语言中,字符串用双引号( " )分隔,给定字符串中的所有引号都用反斜杠( \\ )进行转义:对于输入"We \\said, \\"We want Moshiach Now\\""什么都要用字母做会- ssaid这是逃脱?

I prefer the lexer to whine when this occurs. 发生这种情况时,我希望词法分析器发牢骚。 A lexer/parser should be tight about syntax; 词法分析器/解析器应严格遵守语法; one can always loosen it up later. 以后总是可以松开它。 If you are sloppy, you'll find you can't retract a decision you didn't think you made. 如果您草率,那么您将发现您无法撤回您未曾做出的决定。

Assume that you initially decide to treat " backslash not-an-escape " as that pair of characters, and the "T" is not-an-escape today. 假设您最初决定将“ 反斜杠 not-an-escape ”视为该对字符,而今天的“ T”则为not-an-escape Sometime later you decide to extend the language, and want "\\T" to mean something special, and you change your language. 稍后,您决定扩展语言,并希望“ \\ T”表示特殊含义,然后更改语言。

You'll find an angry mob of programmers storming your design castle, because for them, "\\T" means "\\" "T" (or "T" depending on your default decision), and you just broke their code. 您会发现一群愤怒的程序员席卷您的设计城堡,因为对他们来说,“ \\ T”表示“ \\”“ T”(或“ T”,具体取决于您的默认决定),而您只是破坏了他们的代码。 You hang your head in shame, retract the decision, and then realize... oops, there are no more available escape characters! 您羞愧地垂下头,撤回决定,然后意识到...糟糕,没有更多可用的转义字符!

This lesson goes for any piece of syntax that isn't well defined in your language. 本课适用于您的语言中定义不正确的任何语法。 If it isn't explicitly legal, it should be implicitly illegal and your compiler should check it. 如果它不是明确合法的,则应隐式合法,编译器应对其进行检查。 Or you'll never be able to extend your successful language. 否则您将永远无法扩展成功的语言。

If your language isn't going to be successful, you may not care as much. 如果您的语言不会成功,那么您可能不太在意。

Well, one way to solve the problem is for the backslash to just mean backslash when it precedes a non-escapable character. 嗯,解决问题的一种方法是,当反斜杠位于不可转义字符之前时,它就意味着反斜杠。 That's what Python does: 这就是Python的作用:

>>> print "a\tb"
a   b
>>> print "a\tb\Rc"
a   b\Rc

Obviously, most systems take the escape character to mean "take the next character verbatim", so escaping a "non-escapable" character is usually harmless. 显然,大多数系统将转义字符表示为“逐字取下一个字符”,因此转义“不可转义”字符通常是无害的。 The problem later happens when you get to comparisons and such, where the literal text does not represent the actual value (that's where you see a lot of issues securitywise, especially with things like URLs). 当您进行比较等问题时,问题便会发生,在这种情况下,文字文本不能代表实际值(这是在安全方面看到很多问题的地方,尤其是URL之类的问题)。

So on the one hand, you can only accept a limited number of escaped characters. 因此,一方面,您只能接受有限数量的转义字符。 In that sense, you have an "escape sequence", rather than an escaped character (the \\x is the entire sequence rather than a \\ followed by an x). 从这个意义上讲,您有一个“转义序列”,而不是转义字符(\\ x是整个序列,而不是\\后跟一个x)。 That's like the most safe mechanism, and it's not really burdensome to write. 这就像最安全的机制,并且编写起来并不麻烦。

The other option is to ensure that you you "canonicalizing" everything you compare, through some ruleset. 另一个选择是确保您通过某个规则集“规范化”了您比较的所有内容。 This typically means removing all of the escape sequences properly up front, before comparison and comparing only the final values rather than the literals. 这通常意味着在比较之前先适当地删除所有转义序列,然后仅比较最终值而不是文字。

Most systems interpret the slash as Will Hartung says, except for alphanumerics which are variously used as aliases for control codes, character classes, word boundaries, the start of hex sequences, case region markers, hex or octal digits, etc. \\s in particular often means white-space in perl5 style regexs. 大多数系统解释斜线威尔哈同说, 除了被广泛地用作别名控制码,字符类,字边界,十六进制序列的开始,区域的情况下的标记,六角或八进制数字,字母数字等\\s特别在perl5样式的正则表达式中通常表示空格。 JavaScript, which interprets it as 's' in one context and as whitespace in another suffers from subtle bugs because of this choice. 由于这种选择,JavaScript在一个上下文中将其解释为's' ,而在另一个上下文中将其解释为空白,因此会遇到一些细微的错误。 Consider /foo\\sbar/ vs new RegExp('foo\\sbar') . 考虑/foo\\sbar/new RegExp('foo\\sbar')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 解析文件时如何确定制表符是什么? - How do I determine what a tab character is when parsing a file? RegEx禁止字符,除非已转义 - RegEx disallow a character unless escaped Python:用另一个字符替换转义的引号 - Python: Replace an escaped quote with another character 如何使用JQuery的parseXML将转义的字符串解析为XML对象? - How do I use JQuery's parseXML to parse from escaped string to an XML Object? Python CSV 解析,转义引号字符 - Python CSV Parsing, Escaped Quote Character 解析大文件时,为什么Scala的组合子解析速度慢? 我能做什么? - Why is Scala's combinator parsing slow when parsing large files? What can I do? XSLT解析存储在属性中的HTML转义,并将该属性的内容转换为元素的内容 - XSLT parse escaped HTML stored in an attribute and convert that attribute's content into element's content 标识符中的有效字符是什么? - What is a valid character in an identifier called? 尝试解析转义双引号时,JSON.parse会引发语法错误 - JSON.parse throws syntax error when trying to parse escaped double quote R 中执行此字符串正则表达式处理的最快替代方法是什么? - What's the fastest alternative in R to do this string regex processing?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM