[英]How to make a single-quoted string act like a double-quoted string in Ruby?
I have a file that have an HTMl code, the HTML tags are encoded like the following content:我有一个包含 HTMl 代码的文件,HTML 标记的编码如下内容:
\x3cdiv data-name\x3d\x22region-name\x22 class\x3d\x22main-id\x22\x3eUK\x3c/div\x3e
The decoded HTML should be:解码后的 HTML 应该是:
<div data-name="region-name" class="main-id">UK</div>
In Ruby, I used cgi
library to unescapeHTML
however it does not work because when it read the content it does not identify the encoded tags, here is another example:在 Ruby 中,我使用
cgi
库来unescapeHTML
,但它不起作用,因为当它读取内容时它不能识别编码标签,这是另一个示例:
require 'cgi'
single_quoted_string = '\x3cdiv data-name\x3d\x22region-name\x22 class\x3d\x22main-id\x22\x3eUK\x3c/div\x3e'
double_quoted_string = "\x3cdiv data-name\x3d\x22region-name\x22 class\x3d\x22main-id\x22\x3eUK\x3c/div\x3e"
puts 'unescape single_quoted_string ' + CGI.unescapeHTML(single_quoted_string)
puts 'unescape double_quoted_string ' + CGI.unescapeHTML(double_quoted_string)
The output of the previous code is:前面代码的输出是:
unescape single_quoted_string \x3cdiv data-name\x3d\x22region-name\x22 class\x3d\x22main-id\x22\x3eUK\x3c/div\x3e
unescape double_quoted_string <div data-name="region-name" class="main-id">UK</div>
My question is, how can I make the single_quoted_string
act as if its content is double-quoted to make the function understand the encoded tags?我的问题是,我怎样才能让
single_quoted_string
表现得好像它的内容是双引号的,以使函数理解编码的标签?
Thanks谢谢
Your problem has nothing to do with HTML, \x3c
represent the hex number '3c' in the ascii table .您的问题与 HTML 无关,
\x3c
表示ascii table中的十六进制数字 '3c' 。 Double-quoted strings look for this patterns and convert them to the desired value, single-quoted strings treat it the final outcome.双引号字符串查找此模式并将它们转换为所需的值,单引号字符串将其视为最终结果。
You can check for yourself that CGI is not doing anything.您可以自己检查 CGI 没有做任何事情。
CGI.unescapeHTML(double_quoted_string) == double_quoted_string
The easiest way I know to solve your problem is through gsub
我知道解决您的问题的最简单方法是通过
gsub
def convert(str)
str.gsub(/\\x(\w\w)/) do
[Regexp.last_match(1)].pack("H*")
end
end
single_quoted_string = '\x3cdiv data-name\x3d\x22region-name\x22 class\x3d\x22main-id\x22\x3eUK\x3c/div\x3e'
puts convert(single_quoted_string)
What convert
does is to get every pair of hex escaped values and pack them as characters. convert
所做的是获取每对十六进制转义值并将它们打包为字符。
Ruby's parser allows certain escape sequences in string literals . Ruby 的解析器允许字符串文字中的某些转义序列。
The double-quoted string literal "\x3c"
is recognized as containing a hexadecimal pattern \xnn
which represents the single character <
.双引号字符串文字
"\x3c"
被识别为包含一个十六进制模式\xnn
,它表示单个字符<
。 (0x3C in ASCII) (ASCII 中的 0x3C)
The single-quoted string literal '\x3c'
however is treated literally, ie it represents four characters: \
, x
, 3
, and c
.然而,单引号字符串文字
'\x3c'
字面意思处理,即它表示四个字符: \
、 x
、 3
和c
。
how can I make the
single_quoted_string
act as if its content is double-quoted我怎样才能让
single_quoted_string
表现得好像它的内容是双引号的
You can't.你不能。 In order to turn these four characters into
<
you have to parse the string yourself:为了将这四个字符变成
<
你必须自己解析字符串:
str = '\x3c'
str[2, 2] #=> "3c" take hex part
str[2, 2].hex #=> 60 convert to number
str[2, 2].hex.chr #=> "<" convert to character
You can apply this to gsub
:您可以将此应用于
gsub
:
str = '\x3cdiv data-name\x3d\x22region-name\x22 class\x3d\x22main-id\x22\x3eUK\x3c/div\x3e'
str.gsub(/\\x\h{2}/) { |m| m[2, 2].hex.chr }
#=> "<div data-name=\"region-name\" class=\"main-id\">UK</div>"
/\\x\h{2}/
matches a literal backslash ( \\
) followed by x
and two ( {2}
) hex characters ( \h
). /\\x\h{2}/
匹配文字反斜杠 ( \\
) 后跟x
和两个 ( {2}
) 十六进制字符 ( \h
)。
Just for reference, a CGI encoded string would look like this:仅供参考,CGI 编码的字符串如下所示:
str = "<div data-name=\"region-name\" class=\"main-id\">UK</div>"
CGI.escapeHTML(str)
#=> "<div data-name="region-name" class="main-id">UK</div>"
It uses &...;
它使用
&...;
style character references .样式字符参考。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.